On Mon, 15 Dec 2025 15:51:32 GMT, Yi Wu <[email protected]> wrote:

> This patch adds mid-end support for vectorized min/max reduction operations 
> for half floats. It also includes backend AArch64 support for these 
> operations.
> Both floating point min/max reductions don’t require strict order, because 
> they are associative.
> 
> It will generate NEON fminv/fmaxv reduction instructions when max vector 
> length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it 
> will generate the SVE fminv/fmaxv instructions.
> The patch also adds support for partial min/max reductions on SVE machines 
> using fminv/fmaxv.
> 
> Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is 
> better than the mainline.
> 
> Neoverse N1 (UseSVE = 0, max vector length = 16B):
> 
> Benchmark         vectorDim  Mode   Cnt     8B    16B
> ReductionMaxFP16   256       thrpt 9      3.69   6.44
> ReductionMaxFP16   512       thrpt 9      3.71   7.62
> ReductionMaxFP16   1024      thrpt 9      4.16   8.64
> ReductionMaxFP16   2048      thrpt 9      4.44   9.12
> ReductionMinFP16   256       thrpt 9      3.69   6.43
> ReductionMinFP16   512       thrpt 9      3.70   7.62
> ReductionMinFP16   1024      thrpt 9      4.16   8.64
> ReductionMinFP16   2048      thrpt 9      4.44   9.10
> 
> 
> Neoverse V1 (UseSVE = 1, max vector length = 32B):
> 
> Benchmark         vectorDim  Mode   Cnt     8B    16B    32B
> ReductionMaxFP16   256       thrpt 9      3.96   8.62   8.02
> ReductionMaxFP16   512       thrpt 9      3.54   9.25  11.71
> ReductionMaxFP16   1024      thrpt 9      3.77   8.71  14.07
> ReductionMaxFP16   2048      thrpt 9      3.88   8.44  14.69
> ReductionMinFP16   256       thrpt 9      3.96   8.61   8.03
> ReductionMinFP16   512       thrpt 9      3.54   9.28  11.69
> ReductionMinFP16   1024      thrpt 9      3.76   8.70  14.12
> ReductionMinFP16   2048      thrpt 9      3.87   8.45  14.70
> 
> 
> Neoverse V2 (UseSVE = 2, max vector length = 16B):
> 
> Benchmark         vectorDim  Mode   Cnt     8B    16B
> ReductionMaxFP16   256       thrpt 9      4.78  10.00
> ReductionMaxFP16   512       thrpt 9      3.74  11.33
> ReductionMaxFP16   1024      thrpt 9      3.86   9.59
> ReductionMaxFP16   2048      thrpt 9      3.94   8.71
> ReductionMinFP16   256       thrpt 9      4.78  10.00
> ReductionMinFP16   512       thrpt 9      3.74  11.29
> ReductionMinFP16   1024      thrpt 9      3.86   9.58
> ReductionMinFP16   2048      thrpt 9      3.94   8.71
> 
> 
> Testing:
> hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse 
> N1/V1/V2.

Thanks @yiwu0b11, some superficial comments

test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 
486:

> 484:     @Test
> 485:     @Warmup(500)
> 486:     @IR(counts = {"reduce_minHF_masked", " >0 "},

Could you add IRNode constants for `reduce_minHF_masked`? Also for the max 
version below

test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java
 line 319:

> 317: 
> 318:     @Benchmark
> 319:     public short ReductionMinFP16() {

Suggestion:

    public short reductionMinFP16() {

test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java
 line 328:

> 326: 
> 327:     @Benchmark
> 328:     public short ReductionMaxFP16() {

Suggestion:

    public short reductionMaxFP16() {

-------------

Changes requested by galder (Author).

PR Review: https://git.openjdk.org/jdk/pull/28828#pullrequestreview-3603354237
PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639273162
PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639270984
PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639271426

Reply via email to