> This patch adds intrinsic support for UMIN and UMAX reduction operations in 
> the Vector API on AArch64, enabling direct hardware instruction mapping for 
> better performance.
> 
> Changes:
> --------
> 
> 1. C2 mid-end:
>    - Added UMinReductionVNode and UMaxReductionVNode
> 
> 2. AArch64 Backend:
>    - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>    - Updated match rules for all vector sizes and element types
>    - Both NEON and SVE implementation are supported
> 
> 3. Test:
>    - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>    - Added assembly tests in aarch64-asmtest.py for new instructions
>    - Added a JTReg test file VectorUMinMaxReductionTest.java
> 
> Different configurations were tested on aarch64 and x86 machines, and all 
> tests passed.
> 
> Test results of JMH benchmarks from the panama-vector project:
> --------
> 
> On a Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark                       Unit    Before  Error   After           Error 
>   Uplift
> Byte128Vector.UMAXLanes         ops/ms  411.60  42.18   25226.51        33.92 
>   61.29
> Byte128Vector.UMAXMaskedLanes   ops/ms  558.56  85.12   25182.90        28.74 
>   45.09
> Byte128Vector.UMINLanes         ops/ms  645.58  780.76  28396.29        
> 103.11  43.99
> Byte128Vector.UMINMaskedLanes   ops/ms  621.09  718.27  26122.62        42.68 
>   42.06
> Byte64Vector.UMAXLanes          ops/ms  296.33  34.44   14357.74        15.95 
>   48.45
> Byte64Vector.UMAXMaskedLanes    ops/ms  376.54  44.01   14269.24        21.41 
>   37.90
> Byte64Vector.UMINLanes          ops/ms  373.45  426.51  15425.36        66.20 
>   41.31
> Byte64Vector.UMINMaskedLanes    ops/ms  353.32  346.87  14201.37        13.79 
>   40.19
> Int128Vector.UMAXLanes          ops/ms  174.79  192.51  9906.07         
> 286.93  56.67
> Int128Vector.UMAXMaskedLanes    ops/ms  157.23  206.68  10246.77        11.44 
>   65.17
> Int64Vector.UMAXLanes           ops/ms  95.30   126.49  4719.30         98.57 
>   49.52
> Int64Vector.UMAXMaskedLanes     ops/ms  88.19   87.44   4693.18         19.76 
>   53.22
> Long128Vector.UMAXLanes         ops/ms  80.62   97.82   5064.01         35.52 
>   62.82
> Long128Vector.UMAXMaskedLanes   ops/ms  78.15   102.91  5028.24         8.74  
>   64.34
> Long64Vector.UMAXLanes          ops/ms  47.56   62.01   46.76           52.28 
>   0.98
> Long64Vector.UMAXMaskedLanes    ops/ms  45.44   46.76   45.79           42.91 
>   1.01
> Short128Vector.UMAXLanes        ops/ms  316.65  410.30  14814.82        23.65 
>   46.79
> Short128Vector.UMAXMaskedLanes  ops/ms  308.90  351.78  15155.26        31.03 
>   49.06
> Sh...

Eric Fang has updated the pull request incrementally with one additional commit 
since the last revision:

  Extract some helper functions for better readability

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/28693/files
  - new: https://git.openjdk.org/jdk/pull/28693/files/481c3ee6..fc3dee3d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=01-02

  Stats: 120 lines in 2 files changed: 95 ins; 10 del; 15 mod
  Patch: https://git.openjdk.org/jdk/pull/28693.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693

PR: https://git.openjdk.org/jdk/pull/28693

Reply via email to