Re: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations

Xiaohong Gong Wed, 10 Dec 2025 22:52:06 -0800

On Mon, 8 Dec 2025 03:29:03 GMT, Eric Fang <[email protected]> wrote:


> This patch adds intrinsic support for UMIN and UMAX reduction operations in 
> the Vector API on AArch64, enabling direct hardware instruction mapping for 
> better performance.
> 
> Changes:
> --------
> 
> 1. C2 mid-end:
>    - Added UMinReductionVNode and UMaxReductionVNode
> 
> 2. AArch64 Backend:
>    - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>    - Updated match rules for all vector sizes and element types
>    - Both NEON and SVE implementation are supported
> 
> 3. Test:
>    - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>    - Added assembly tests in aarch64-asmtest.py for new instructions
>    - Added a JTReg test file VectorUMinMaxReductionTest.java
> 
> Different configurations were tested on aarch64 and x86 machines, and all 
> tests passed.
> 
> Test results of JMH benchmarks from the panama-vector project:
> --------
> 
> On a Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark                       Unit    Before  Error   After           Error 
>   Uplift
> Byte128Vector.UMAXLanes         ops/ms  411.60  42.18   25226.51        33.92 
>   61.29
> Byte128Vector.UMAXMaskedLanes   ops/ms  558.56  85.12   25182.90        28.74 
>   45.09
> Byte128Vector.UMINLanes         ops/ms  645.58  780.76  28396.29        
> 103.11  43.99
> Byte128Vector.UMINMaskedLanes   ops/ms  621.09  718.27  26122.62        42.68 
>   42.06
> Byte64Vector.UMAXLanes          ops/ms  296.33  34.44   14357.74        15.95 
>   48.45
> Byte64Vector.UMAXMaskedLanes    ops/ms  376.54  44.01   14269.24        21.41 
>   37.90
> Byte64Vector.UMINLanes          ops/ms  373.45  426.51  15425.36        66.20 
>   41.31
> Byte64Vector.UMINMaskedLanes    ops/ms  353.32  346.87  14201.37        13.79 
>   40.19
> Int128Vector.UMAXLanes          ops/ms  174.79  192.51  9906.07         
> 286.93  56.67
> Int128Vector.UMAXMaskedLanes    ops/ms  157.23  206.68  10246.77        11.44 
>   65.17
> Int64Vector.UMAXLanes           ops/ms  95.30   126.49  4719.30         98.57 
>   49.52
> Int64Vector.UMAXMaskedLanes     ops/ms  88.19   87.44   4693.18         19.76 
>   53.22
> Long128Vector.UMAXLanes         ops/ms  80.62   97.82   5064.01         35.52 
>   62.82
> Long128Vector.UMAXMaskedLanes   ops/ms  78.15   102.91  5028.24         8.74  
>   64.34
> Long64Vector.UMAXLanes          ops/ms  47.56   62.01   46.76           52.28 
>   0.98
> Long64Vector.UMAXMaskedLanes    ops/ms  45.44   46.76   45.79           42.91 
>   1.01
> Short128Vector.UMAXLanes        ops/ms  316.65  410.30  14814.82        23.65 
>   46.79
> Short128Vector.UMAXMaskedLanes  ops/ms  308.90  351.78  15155.26        31.03 
>   49.06
> Sh...

Nice work. Thanks for your support! 

I noticed that this PR contains the same commit of 
https://github.com/openjdk/jdk/pull/28692. Could you please split the change 
from this PR? If this PR depends on https://github.com/openjdk/jdk/pull/28692, 
I wonder whether we can change the target merge branch to `pr/28692` instead of 
`master` please?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3640490083

Re: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations

Reply via email to