On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan <sviswanat...@openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1648: > >> 1646: vpermd(xtmp3, xtmp1, xtmp3, vlen_enc == Assembler::AVX_512bit ? >> vlen_enc : Assembler::AVX_256bit); >> 1647: vpsubd(xtmp1, xtmp1, xtmp2, vlen_enc); >> 1648: vpor(dst, dst, xtmp3, vlen_enc); > > xtmp1 starts out as 0, 1,... > so vpermd will place the lower 64 bit from xtmp3 to lower 64 bit of dst > why vpsubd and not vpaddd? It looks to me that vpaddd is more intutive to > understand. > if vpadd, xtmp1 will become 2,3 in next iteration > so vpermd will place the lower 64 bit from xtmp3 to 127:64 of dst > and so on so forth > > Another point, for avx512 it looks to me that vpermd and vpor could be merged > into one single instruction vpermd having dst as destination and merge bit > set to true. Please ignore the last bit about avx512 vpermd merge as we are not using mask registers here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1388837311