On Wed, 18 Mar 2026 16:32:39 GMT, Sandhya Viswanathan 
<[email protected]> wrote:

>> Mohamed Issa has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains seven commits:
>> 
>>  - Refactor some of the code and re-introduce some instructions previously 
>> eliminated while also adding new ones.
>>  - Merge branch 'master' into user/missa-prime/avx10_2
>>  - Merge branch 'master' into user/missa-prime/avx10_2
>>  - Remove half precision min/max reduction definitions and adjust 
>> corresponding benchmarks.
>>  - Use alternative instruction flow for half precision reduction loops and 
>> add supporting infrastructure.
>>  - Merge branch 'master' into user/missa-prime/avx10_2
>>  - Replace scalar AVX10.2 floating point min/max instructions with more 
>> efficient sequence
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7084:
> 
>> 7082: }
>> 7083: 
>> 7084: void C2_MacroAssembler::scalar_max_min_fp16_avx10_2(int opcode, 
>> XMMRegister dst, XMMRegister src1, XMMRegister src2,
> 
> Could be named as sminmax_fp16_avx10_2 on similar lines of sminmax_fp_avx10_2.

They should all match now.

> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 1965:
> 
>> 1963: 
>> 1964: void MacroAssembler::movhlf(XMMRegister dst, XMMRegister src) {
>> 1965:   vmovw(dst, src);
> 
> For pre avx10_2, we will need to continue to use the move through scratch 
> register.

This is fixed now.

> src/hotspot/cpu/x86/x86.ad line 1724:
> 
>> 1722:     } else {
>> 1723:       __ vucomish(p, q);
>> 1724:     }
> 
> If there is no other benefit (like a reduced branch or something else) by 
> using vucomx*, it is better to continue to use vucomish.

I removed vucomx* instructions.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2962311841
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2962315289
PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2962314251

Reply via email to