On Wed, 21 Jan 2026 10:15:24 GMT, Eric Fang <[email protected]> wrote:
>> This patch adds intrinsic support for UMIN and UMAX reduction operations in
>> the Vector API on AArch64, enabling direct hardware instruction mapping for
>> better performance.
>>
>> Changes:
>> --------
>>
>> 1. C2 mid-end:
>> - Added UMinReductionVNode and UMaxReductionVNode
>>
>> 2. AArch64 Backend:
>> - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>> - Updated match rules for all vector sizes and element types
>> - Both NEON and SVE implementation are supported
>>
>> 3. Test:
>> - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>> - Added assembly tests in aarch64-asmtest.py for new instructions
>> - Added a JTReg test file VectorUMinMaxReductionTest.java
>>
>> Different configurations were tested on aarch64 and x86 machines, and all
>> tests passed.
>>
>> Test results of JMH benchmarks from the panama-vector project:
>> --------
>>
>> On a Nvidia Grace machine with 128-bit SVE:
>>
>> Benchmark Unit Before Error After
>> Error Uplift
>> Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51
>> 33.92 61.29
>> Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90
>> 28.74 45.09
>> Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29
>> 103.11 43.99
>> Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62
>> 42.68 42.06
>> Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74
>> 15.95 48.45
>> Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24
>> 21.41 37.90
>> Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36
>> 66.20 41.31
>> Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37
>> 13.79 40.19
>> Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07
>> 286.93 56.67
>> Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77
>> 11.44 65.17
>> Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30
>> 98.57 49.52
>> Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18
>> 19.76 53.22
>> Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01
>> 35.52 62.82
>> Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74
>> 64.34
>> Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76
>> 52.28 0.98
>> Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79
>> 42.91 1.01
>> Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82
>> 23.65 46.79
>> ...
>
> Eric Fang has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Extract some helper functions for better readability
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.hpp line 50:
> 48: void sve_maxv(bool is_unsigned, FloatRegister dst, SIMD_RegVariant size,
> 49: PRegister pg, FloatRegister src);
> 50:
Using separate definitions here is adding unnecessary complexity.
I'd do something like this in the header file:
// Typedefs used to disambiguate overloaded member functions.
typedef void (Assembler::*neon_reduction2)
(FloatRegister, Assembler::SIMD_Arrangement, FloatRegister);
typedef void (Assembler::*sve_reduction3)
(FloatRegister, Assembler::SIMD_RegVariant, PRegister, FloatRegister);
// Helper functions for min/max reduction operations
void neon_minp(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size,
FloatRegister src1, FloatRegister src2) {
auto m = is_unsigned ? &Assembler::uminp : &Assembler::sminp;
(this->*m)(dst, size, src1, src2);
}
void neon_maxp(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size,
FloatRegister src1, FloatRegister src2) {
auto m = is_unsigned ? &Assembler::umaxp : &Assembler::smaxp;
(this->*m)(dst, size, src1, src2);
}
void neon_minv(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size,
FloatRegister src) {
auto m = is_unsigned ? (neon_reduction2)&Assembler::uminv :
&Assembler::sminv;
(this->*m)(dst, size, src);
}
void neon_maxv(bool is_unsigned, FloatRegister dst, SIMD_Arrangement size,
FloatRegister src) {
auto m = is_unsigned ? (neon_reduction2)&Assembler::umaxv :
&Assembler::smaxv;
(this->*m)(dst, size, src);
}
void sve_minv(bool is_unsigned, FloatRegister dst, SIMD_RegVariant size,
PRegister pg, FloatRegister src) {
auto m = is_unsigned ? (sve_reduction3)&Assembler::sve_uminv :
&Assembler::sve_sminv;
(this->*m)(dst, size, pg, src);
}
void sve_maxv(bool is_unsigned, FloatRegister dst, SIMD_RegVariant size,
PRegister pg, FloatRegister src) {
auto m = is_unsigned ? (sve_reduction3)&Assembler::sve_umaxv :
&Assembler::sve_smaxv;
(this->*m)(dst, size, pg, src);
}
To some extent it's a matter of taste, but please try not to use much more
repetitive and boilerplate code than you need to.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28693#discussion_r2717902945