On Mon, 9 Mar 2026 19:36:05 GMT, Mohamed Issa <[email protected]> wrote:
>> Although the scalar AVX10 floating point min/max instructions (VMINMAXSD, >> VMINMAXSS, VMINMAXSH) are compact, it's better not to use them in reduction >> loops. This is because of serial data dependencies that get triggered across >> loop iterations. An alternate implementation using comparisons and jumps >> leverages branch prediction and limits the effects of data dependencies to >> cheaper instructions (e.g, MOV). Please note that this method is already >> used for non-AVX10 min/max reduction loop scenarios. >> >> With that background provided, these changes remove AVX10 floating point >> min/max instructions from single and double precision floating point >> reduction loops. They are replaced by the separate instruction sequence >> described above. Currently, min/max half precision floating point reduction >> loops aren't detectable, so they will be handled in a separate PR. There is >> also some code cleanup to remove unused instruction definitions while also >> adding necessary supporting infrastructure. The JTREG tests listed below >> were used to verify correctness with the recommended JVM options mentioned >> in corresponding source files. All modifications and tests used [OpenJDK >> v27-b12](https://github.com/openjdk/jdk/releases/tag/jdk-27%2B12) as the >> baseline build. >> >> 1. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector64Tests.java` >> 2. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector128Tests.java` >> 3. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector256Tests.java` >> 4. `jtreg:test/jdk/jdk/incubator/vector/DoubleVector512Tests.java` >> 5. `jtreg:test/jdk/jdk/incubator/vector/DoubleVectorMaxTests.java` >> 6. `jtreg:test/jdk/jdk/incubator/vector/FloatVector64Tests.java` >> 7. `jtreg:test/jdk/jdk/incubator/vector/FloatVector128Tests.java` >> 8. `jtreg:test/jdk/jdk/incubator/vector/FloatVector256Tests.java` >> 9. `jtreg:test/jdk/jdk/incubator/vector/FloatVector512Tests.java` >> 10. `jtreg:test/jdk/jdk/incubator/vector/FloatVectorMaxTests.java` >> 11. >> `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` >> 12. >> `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` >> 13. `jtreg:test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java` >> 14. >> `jtreg:test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java` >> >> Finally, the JMH micro-benchmarks listed below were updated to ensure all >> code paths are exercised. >> >> 1. >> `micro:test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java` >> 2. `mi... > > Mohamed Issa has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into user/missa-prime/avx10_2 > - Remove half precision min/max reduction definitions and adjust > corresponding benchmarks. > - Use alternative instruction flow for half precision reduction loops and > add supporting infrastructure. > - Merge branch 'master' into user/missa-prime/avx10_2 > - Replace scalar AVX10.2 floating point min/max instructions with more > efficient sequence src/hotspot/cpu/x86/x86.ad line 1743: > 1741: // Math.min() # Math.max() > 1742: // ----------------------------- > 1743: // (v)ucomis[s/d]. # Add sh to the comment here. src/hotspot/cpu/x86/x86.ad line 1763: > 1761: } else { > 1762: emit_fp_ucom_double(masm, a, b); > 1763: } It would be good to have a function like emit_fp_ucom(masm, pt, a, b) and use that here. src/hotspot/cpu/x86/x86.ad line 1791: > 1789: } else { > 1790: __ movdbl(dst, a); > 1791: } Likewise a function movfp(prec, dst, src) would be good to define and use here. src/hotspot/cpu/x86/x86.ad line 7408: > 7406: > 7407: // max = java.lang.Math.max(float a, float b) > 7408: instruct maxF_reg_avx10_2(regF dst, regF a, regF b) We can merge the maxF_reg_avx10_2 and minF_reg_avx10_2 into one instruct say minmaxF_reg_avx10_2 with two match rules: match(Set dst (MaxF a b)); match(Set dst (MinF a b)); Likewise for double minmax. src/hotspot/cpu/x86/x86.ad line 7420: > 7418: %} > 7419: > 7420: instruct maxF_reduction_reg_avx10_2(regF dst, regF a, regF b, regF > xtmp, rRegI rtmp, rFlagsReg cr) We can merge the maxF_reduction_reg_avx10_2 and minF_reduction_reg_avx10_2 into one instruct say minmaxF_reduction_reg_avx10_2 with two match rules: match(Set dst (MaxF a b)); match(Set dst (MinF a b)); Likewise for double minmax. src/hotspot/cpu/x86/x86.ad line 7435: > 7433: > 7434: // max = java.lang.Math.max(float a, float b) > 7435: instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, > legRegF atmp, legRegF btmp) We can merge the maxF_reg and minF_reg into one instruct say minmaxF_reg with two match rules: match(Set dst (MaxF a b)); match(Set dst (MinF a b)); Likewise for double minmax. src/hotspot/cpu/x86/x86.ad line 7448: > 7446: %} > 7447: > 7448: instruct maxF_reduction_reg(legRegF dst, legRegF a, legRegF b, legRegF > xtmp, rRegI rtmp, rFlagsReg cr) We can merge the maxF_reduction_reg and minF_reduction_reg into one instruct say minmaxF_reduction_reg with two match rules: match(Set dst (MaxF a b)); match(Set dst (MinF a b)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932578996 PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932554904 PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932564600 PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932617343 PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932623330 PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932640106 PR Review Comment: https://git.openjdk.org/jdk/pull/29831#discussion_r2932642570
