This patch adds mid-end support for vectorized min/max reduction operations for 
half floats. It also includes backend AArch64 support for these operations.
Both floating point min/max reductions don’t require strict order, because they 
are associative.

It will generate NEON fminv/fmaxv reduction instructions when max vector length 
is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will 
generate the SVE fminv/fmaxv instructions.
The patch also adds support for partial min/max reductions on SVE machines 
using fminv/fmaxv.

Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is 
better than the mainline.

Neoverse N1 (UseSVE = 0, max vector length = 16B):
Benchmark         vectorDim  Mode   Cnt     8B    16B
ReductionMaxFP16   256       thrpt 9      3.69   6.44
ReductionMaxFP16   512       thrpt 9      3.71   7.62
ReductionMaxFP16   1024      thrpt 9      4.16   8.64
ReductionMaxFP16   2048      thrpt 9      4.44   9.12
ReductionMinFP16   256       thrpt 9      3.69   6.43
ReductionMinFP16   512       thrpt 9      3.70   7.62
ReductionMinFP16   1024      thrpt 9      4.16   8.64
ReductionMinFP16   2048      thrpt 9      4.44   9.10

Neoverse V1 (UseSVE = 1, max vector length = 32B):
Benchmark         vectorDim  Mode   Cnt     8B    16B    32B
ReductionMaxFP16   256       thrpt 9      3.96   8.62   8.02
ReductionMaxFP16   512       thrpt 9      3.54   9.25  11.71
ReductionMaxFP16   1024      thrpt 9      3.77   8.71  14.07
ReductionMaxFP16   2048      thrpt 9      3.88   8.44  14.69
ReductionMinFP16   256       thrpt 9      3.96   8.61   8.03
ReductionMinFP16   512       thrpt 9      3.54   9.28  11.69
ReductionMinFP16   1024      thrpt 9      3.76   8.70  14.12
ReductionMinFP16   2048      thrpt 9      3.87   8.45  14.70

Neoverse V2 (UseSVE = 2, max vector length = 16B):
Benchmark         vectorDim  Mode   Cnt     8B    16B
ReductionMaxFP16   256       thrpt 9      4.78  10.00
ReductionMaxFP16   512       thrpt 9      3.74  11.33
ReductionMaxFP16   1024      thrpt 9      3.86   9.59
ReductionMaxFP16   2048      thrpt 9      3.94   8.71
ReductionMinFP16   256       thrpt 9      4.78  10.00
ReductionMinFP16   512       thrpt 9      3.74  11.29
ReductionMinFP16   1024      thrpt 9      3.86   9.58
ReductionMinFP16   2048      thrpt 9      3.94   8.71

Testing:
hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2.

-------------

Commit messages:
 - 8373344: Add support for FP16 min/max reduction operations

Changes: https://git.openjdk.org/jdk/pull/28828/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8373344
  Stats: 968 lines in 13 files changed: 296 ins; 22 del; 650 mod
  Patch: https://git.openjdk.org/jdk/pull/28828.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28828/head:pull/28828

PR: https://git.openjdk.org/jdk/pull/28828

Reply via email to