On Tue, 16 Jan 2024 06:08:35 GMT, Jatin Bhateja <[email protected]> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634:
>>
>>> 1632: Register offset,
>>> XMMRegister offset_vec, XMMRegister idx_vec,
>>> 1633: XMMRegister
>>> xtmp1, XMMRegister xtmp2, XMMRegister xtmp3, KRegister mask,
>>> 1634: KRegister gmask,
>>> int vlen_enc, int vlen) {
>>
>> Would you mind giving a quick summary of what the input registers are and
>> what exactly this method does?
>> Why do we need to call `vgather_subword_avx3` so many times
>> (`lane_count_subwords`)?
>
> Method gathers sub-words from gather indices using integral gather
> instructions, because of the lane size mismatch b/w int and sub-words
> algorithm makes multiple calls to vgather_subword_avx3.
As a reviewer, I feel like I have to reverse engineer this now. I would really
appreciate if there was a proper comment at the beginning, that tells me what
is happening here. Maybe use some equation at the beginning, of what we want to
acheive in the abstract, then explain why that does not work directly, and why
you have to break it down into a loop, and then state the equation again in the
loop form.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453020617