On Wed, 18 Mar 2026 16:32:39 GMT, Sandhya Viswanathan
<[email protected]> wrote:
>> Mohamed Issa has updated the pull request with a new target base due to a
>> merge or a rebase. The pull request now contains seven commits:
>>
>> - Refactor some of the code and re-introduce some instructions previously
>> eliminated while also adding new ones.
>> - Merge branch 'master' into user/missa-prime/avx10_2
>> - Merge branch 'master' into user/missa-prime/avx10_2
>> - Remove half precision min/max reduction definitions and adjust
>> corresponding benchmarks.
>> - Use alternative instruction flow for half precision reduction loops and
>> add supporting infrastructure.
>> - Merge branch 'master' into user/missa-prime/avx10_2
>> - Replace scalar AVX10.2 floating point min/max instructions with more
>> efficient sequence
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7084:
>
>> 7082: }
>> 7083:
>> 7084: void C2_MacroAssembler::scalar_max_min_fp16_avx10_2(int opcode,
>> XMMRegister dst, XMMRegister src1, XMMRegister src2,
>
> Could be named as sminmax_fp16_avx10_2 on similar lines of sminmax_fp_avx10_2.
They should all match now.
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1965:
>
>> 1963:
>> 1964: void MacroAssembler::movhlf(XMMRegister dst, XMMRegister src) {
>> 1965: vmovw(dst, src);
>
> For pre avx10_2, we will need to continue to use the move through scratch
> register.
This is fixed now.
> src/hotspot/cpu/x86/x86.ad line 1724:
>
>> 1722: } else {
>> 1723: __ vucomish(p, q);
>> 1724: }
>
> If there is no other benefit (like a reduced branch or something else) by
> using vucomx*, it is better to continue to use vucomish.
I removed vucomx* instructions.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2962311841
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2962315289
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2962314251