Re: RFR: 8318650: Optimized subword gather for x86 targets. [v10]

Emanuel Peter Mon, 15 Jan 2024 23:35:15 -0800

On Tue, 16 Jan 2024 06:08:35 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:


>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634:
>> 
>>> 1632:                                                     Register offset, 
>>> XMMRegister offset_vec, XMMRegister idx_vec,
>>> 1633:                                                     XMMRegister 
>>> xtmp1, XMMRegister xtmp2, XMMRegister xtmp3, KRegister mask,
>>> 1634:                                                     KRegister gmask, 
>>> int vlen_enc, int vlen) {
>> 
>> Would you mind giving a quick summary of what the input registers are and 
>> what exactly this method does?
>> Why do we need to call `vgather_subword_avx3` so many times 
>> (`lane_count_subwords`)?
>
> Method gathers sub-words from gather indices using integral gather 
> instructions, because of the lane size mismatch b/w int and sub-words 
> algorithm makes multiple calls to vgather_subword_avx3.

As a reviewer, I feel like I have to reverse engineer this now. I would really 
appreciate if there was a proper comment at the beginning, that tells me what 
is happening here. Maybe use some equation at the beginning, of what we want to 
acheive in the abstract, then explain why that does not work directly, and why 
you have to break it down into a loop, and then state the equation again in the 
loop form.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453020617

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v10]

Reply via email to