This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance.
Changes: -------- 1. C2 mid-end: - Added UMinReductionVNode and UMaxReductionVNode 2. AArch64 Backend: - Added uminp/umaxp/sve_uminv/sve_umaxv instructions - Updated match rules for all vector sizes and element types - Both NEON and SVE implementation are supported 3. Test: - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java - Added assembly tests in aarch64-asmtest.py for new instructions - Added a JTReg test file VectorUMinMaxReductionTest.java Different configurations were tested on aarch64 and x86 machines, and all tests passed. Test results of JMH benchmarks from the panama-vector project: -------- On a Nvidia Grace machine with 128-bit SVE: Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 Short64Vector.UMAXLanes ops/ms 190.38 245.09 8022.46 14.30 42.14 Short64Vector.UMAXMaskedLanes ops/ms 195.54 36.15 7930.28 11.88 40.56 On a Nvidia Grace machine with 128-bit NEON: Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 414.69 42.52 25257.61 25.91 60.91 Byte128Vector.UMAXMaskedLanes ops/ms 552.00 56.61 23063.14 304.45 41.78 Byte128Vector.UMINLanes ops/ms 634.98 849.04 28444.37 180.80 44.80 Byte128Vector.UMINMaskedLanes ops/ms 612.88 735.18 26127.07 27.99 42.63 Byte64Vector.UMAXLanes ops/ms 291.53 32.19 13893.62 28.09 47.66 Byte64Vector.UMAXMaskedLanes ops/ms 363.34 48.17 13290.59 12.53 36.58 Byte64Vector.UMINLanes ops/ms 368.70 433.60 15416.90 15.80 41.81 Byte64Vector.UMINMaskedLanes ops/ms 350.46 371.05 14524.29 121.63 41.44 Int128Vector.UMAXLanes ops/ms 177.67 201.38 10182.82 20.21 57.31 Int128Vector.UMAXMaskedLanes ops/ms 155.25 187.88 9194.13 393.35 59.22 Int64Vector.UMAXLanes ops/ms 93.93 115.02 5106.79 4.54 54.37 Int64Vector.UMAXMaskedLanes ops/ms 87.01 88.50 4405.87 8.06 50.63 Long128Vector.UMAXLanes ops/ms 80.32 98.50 3229.80 40.53 40.21 Long128Vector.UMAXMaskedLanes ops/ms 77.65 103.25 3161.50 4.45 40.72 Long64Vector.UMAXLanes ops/ms 47.72 65.38 46.41 50.38 0.97 Long64Vector.UMAXMaskedLanes ops/ms 45.26 47.46 45.13 47.23 1.00 Short128Vector.UMAXLanes ops/ms 316.09 429.34 14748.07 14.78 46.66 Short128Vector.UMAXMaskedLanes ops/ms 307.70 342.54 14359.11 44.99 46.67 Short64Vector.UMAXLanes ops/ms 187.67 253.01 8180.63 178.65 43.59 Short64Vector.UMAXMaskedLanes ops/ms 191.10 33.51 7949.19 108.65 41.60 ------------- Commit messages: - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations - 8372978: [VectorAPI] Fix incorrect identity values in UMIN/UMAX reductions Changes: https://git.openjdk.org/jdk/pull/28693/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372980 Stats: 1607 lines in 49 files changed: 835 ins; 16 del; 756 mod Patch: https://git.openjdk.org/jdk/pull/28693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693 PR: https://git.openjdk.org/jdk/pull/28693
