On Tue, 20 Jan 2026 19:23:38 GMT, Andrew Haley <[email protected]> wrote:

>> Eric Fang has updated the pull request with a new target base due to a merge 
>> or a rebase. The pull request now contains four commits:
>> 
>>  - Rebase commit 56d7b52
>>  - Merge branch 'master' into JDK-8372980-umin-umax-intrinsic
>>  - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max 
>> reduction operations
>>    
>>    This patch adds intrinsic support for UMIN and UMAX reduction operations
>>    in the Vector API on AArch64, enabling direct hardware instruction mapping
>>    for better performance.
>>    
>>    Changes:
>>    --------
>>    
>>    1. C2 mid-end:
>>       - Added UMinReductionVNode and UMaxReductionVNode
>>    
>>    2. AArch64 Backend:
>>       - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>>       - Updated match rules for all vector sizes and element types
>>       - Both NEON and SVE implementation are supported
>>    
>>    3. Test:
>>       - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>>       - Added assembly tests in aarch64-asmtest.py for new instructions
>>       - Added a JTReg test file VectorUMinMaxReductionTest.java
>>    
>>    Different configurations were tested on aarch64 and x86 machines, and
>>    all tests passed.
>>    
>>    Test results of JMH benchmarks from the panama-vector project:
>>    --------
>>    
>>    On a Nvidia Grace machine with 128-bit SVE:
>>    ```
>>    Benchmark                 Unit    Before  Error   After           Error   
>> Uplift
>>    Byte128Vector.UMAXLanes           ops/ms  411.60  42.18   25226.51        
>> 33.92   61.29
>>    Byte128Vector.UMAXMaskedLanes     ops/ms  558.56  85.12   25182.90        
>> 28.74   45.09
>>    Byte128Vector.UMINLanes           ops/ms  645.58  780.76  28396.29        
>> 103.11  43.99
>>    Byte128Vector.UMINMaskedLanes     ops/ms  621.09  718.27  26122.62        
>> 42.68   42.06
>>    Byte64Vector.UMAXLanes            ops/ms  296.33  34.44   14357.74        
>> 15.95   48.45
>>    Byte64Vector.UMAXMaskedLanes      ops/ms  376.54  44.01   14269.24        
>> 21.41   37.90
>>    Byte64Vector.UMINLanes            ops/ms  373.45  426.51  15425.36        
>> 66.20   41.31
>>    Byte64Vector.UMINMaskedLanes      ops/ms  353.32  346.87  14201.37        
>> 13.79   40.19
>>    Int128Vector.UMAXLanes            ops/ms  174.79  192.51  9906.07         
>> 286.93  56.67
>>    Int128Vector.UMAXMaskedLanes      ops/ms  157.23  206.68  10246.77        
>> 11.44   65.17
>>    Int64Vector.UMAXLanes             ops/ms  95.30   126.49  4719.30         
>> 98.57   49.52
>>    Int64Vector.UMAXMaskedLanes       ops/ms  88.19   87.44   4693.18         
>> 19.76   53.22
>>    Long128Vector.UMAXLanes           ops/ms  80.62   97.82   5064.01         
>> 35.52   62.82
>>    Long128Vector.UMAXMaskedLanes     ops/ms  78.15   102.91  5028.24         
>> 8.74    64.34
>>    Long64Vector.UMAXLanes            ops/ms  47.56   62.01   46.76           
>> 52.28   0.98
>>    Long64V...
>
> I'm sorry, I _completely_ overthought that one. All you need are definitions 
> for `min[vp]` and `max[vp]` in C2_Macroassembler.
> 
> Like so:
> 
> `void minv(bool is_unsigned, ...) { if (is_unsigned) { uminv(... } else { 
> sminv(... } }`
> 
> No need to mess with class `Assembler`.

@theRealAph I have made the change, please help take another look, thanks~

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3777257496

Reply via email to