On Tue, 20 Jan 2026 10:01:31 GMT, Eric Fang <[email protected]> wrote:
>> This patch adds intrinsic support for UMIN and UMAX reduction operations in >> the Vector API on AArch64, enabling direct hardware instruction mapping for >> better performance. >> >> Changes: >> -------- >> >> 1. C2 mid-end: >> - Added UMinReductionVNode and UMaxReductionVNode >> >> 2. AArch64 Backend: >> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions >> - Updated match rules for all vector sizes and element types >> - Both NEON and SVE implementation are supported >> >> 3. Test: >> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java >> - Added assembly tests in aarch64-asmtest.py for new instructions >> - Added a JTReg test file VectorUMinMaxReductionTest.java >> >> Different configurations were tested on aarch64 and x86 machines, and all >> tests passed. >> >> Test results of JMH benchmarks from the panama-vector project: >> -------- >> >> On a Nvidia Grace machine with 128-bit SVE: >> >> Benchmark Unit Before Error After >> Error Uplift >> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 >> 33.92 61.29 >> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 >> 28.74 45.09 >> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 >> 103.11 43.99 >> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 >> 42.68 42.06 >> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 >> 15.95 48.45 >> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 >> 21.41 37.90 >> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 >> 66.20 41.31 >> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 >> 13.79 40.19 >> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 >> 286.93 56.67 >> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 >> 11.44 65.17 >> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 >> 98.57 49.52 >> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 >> 19.76 53.22 >> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 >> 35.52 62.82 >> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 >> 64.34 >> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 >> 52.28 0.98 >> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 >> 42.91 1.01 >> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 >> 23.65 46.79 >> ... > > Eric Fang has updated the pull request with a new target base due to a merge > or a rebase. The pull request now contains four commits: > > - Rebase commit 56d7b52 > - Merge branch 'master' into JDK-8372980-umin-umax-intrinsic > - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max > reduction operations > > This patch adds intrinsic support for UMIN and UMAX reduction operations > in the Vector API on AArch64, enabling direct hardware instruction mapping > for better performance. > > Changes: > -------- > > 1. C2 mid-end: > - Added UMinReductionVNode and UMaxReductionVNode > > 2. AArch64 Backend: > - Added uminp/umaxp/sve_uminv/sve_umaxv instructions > - Updated match rules for all vector sizes and element types > - Both NEON and SVE implementation are supported > > 3. Test: > - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java > - Added assembly tests in aarch64-asmtest.py for new instructions > - Added a JTReg test file VectorUMinMaxReductionTest.java > > Different configurations were tested on aarch64 and x86 machines, and > all tests passed. > > Test results of JMH benchmarks from the panama-vector project: > -------- > > On a Nvidia Grace machine with 128-bit SVE: > ``` > Benchmark Unit Before Error After Error > Uplift > Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 > 33.92 61.29 > Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 > 28.74 45.09 > Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 > 103.11 43.99 > Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 > 42.68 42.06 > Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 > 15.95 48.45 > Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 > 21.41 37.90 > Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 > 66.20 41.31 > Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 > 13.79 40.19 > Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 > 286.93 56.67 > Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 > 11.44 65.17 > Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 > 98.57 49.52 > Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 > 19.76 53.22 > Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 > 35.52 62.82 > Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 > 8.74 64.34 > Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 > 52.28 0.98 > Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 > 42.91 1.01 > Short128Vector.UMAXLanes ops/ms 316.65 4... Hi, the patch adds intrinsic support for VectorAPI umin/umax reduction, it is ready for review, would you mind take a look, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3772017425
