Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

Sandhya Viswanathan Thu, 09 Nov 2023 19:37:51 -0800

On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan 
<sviswanat...@openjdk.org> wrote:


>> Jatin Bhateja has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1648:
> 
>> 1646:     vpermd(xtmp3, xtmp1, xtmp3, vlen_enc == Assembler::AVX_512bit ? 
>> vlen_enc : Assembler::AVX_256bit);
>> 1647:     vpsubd(xtmp1, xtmp1, xtmp2, vlen_enc);
>> 1648:     vpor(dst, dst, xtmp3, vlen_enc);
> 
> xtmp1 starts out as 0, 1,...
> so vpermd will place the lower 64 bit from xtmp3 to lower 64 bit of dst
> why vpsubd and not vpaddd? It looks to me that vpaddd is more intutive to 
> understand.
> if vpadd, xtmp1 will become 2,3 in next iteration 
> so vpermd will place the lower 64 bit from xtmp3 to 127:64 of dst 
> and so on so forth
> 
> Another point, for avx512 it looks to me that vpermd and vpor could be merged 
> into one single instruction vpermd having dst as destination and merge bit 
> set to true.

Please ignore the last bit about avx512 vpermd merge as we are not using mask 
registers here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1388837311

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

Reply via email to