> This patch adds mid-end support for vectorized min/max reduction operations > for half floats. It also includes backend AArch64 support for these > operations. > Both floating point min/max reductions don’t require strict order, because > they are associative. > > It will generate NEON fminv/fmaxv reduction instructions when max vector > length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it > will generate the SVE fminv/fmaxv instructions. > The patch also adds support for partial min/max reductions on SVE machines > using fminv/fmaxv. > > Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is > better than the mainline. > > Neoverse N1 (UseSVE = 0, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 3.69 6.44 > ReductionMaxFP16 512 thrpt 9 3.71 7.62 > ReductionMaxFP16 1024 thrpt 9 4.16 8.64 > ReductionMaxFP16 2048 thrpt 9 4.44 9.12 > ReductionMinFP16 256 thrpt 9 3.69 6.43 > ReductionMinFP16 512 thrpt 9 3.70 7.62 > ReductionMinFP16 1024 thrpt 9 4.16 8.64 > ReductionMinFP16 2048 thrpt 9 4.44 9.10 > > > Neoverse V1 (UseSVE = 1, max vector length = 32B): > > Benchmark vectorDim Mode Cnt 8B 16B 32B > ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 > ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 > ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 > ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 > ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 > ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 > ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 > ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 > > > Neoverse V2 (UseSVE = 2, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 4.78 10.00 > ReductionMaxFP16 512 thrpt 9 3.74 11.33 > ReductionMaxFP16 1024 thrpt 9 3.86 9.59 > ReductionMaxFP16 2048 thrpt 9 3.94 8.71 > ReductionMinFP16 256 thrpt 9 4.78 10.00 > ReductionMinFP16 512 thrpt 9 3.74 11.29 > ReductionMinFP16 1024 thrpt 9 3.86 9.58 > ReductionMinFP16 2048 thrpt 9 3.94 8.71 > > > Testing: > hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse > N1/V1/V2.
Yi Wu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Replace assert with verify - Add IRNode constant and code refactor - Merge remote-tracking branch 'origin/master' into yiwu-8373344 - 8373344: Add support for FP16 min/max reduction operations This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. Both floating point min/max reductions don’t require strict order, because they are associative. It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. Neoverse N1 (UseSVE = 0, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 3.69 6.44 ReductionMaxFP16 512 thrpt 9 3.71 7.62 ReductionMaxFP16 1024 thrpt 9 4.16 8.64 ReductionMaxFP16 2048 thrpt 9 4.44 9.12 ReductionMinFP16 256 thrpt 9 3.69 6.43 ReductionMinFP16 512 thrpt 9 3.70 7.62 ReductionMinFP16 1024 thrpt 9 4.16 8.64 ReductionMinFP16 2048 thrpt 9 4.44 9.10 Neoverse V1 (UseSVE = 1, max vector length = 32B): Benchmark vectorDim Mode Cnt 8B 16B 32B ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 Neoverse V2 (UseSVE = 2, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 4.78 10.00 ReductionMaxFP16 512 thrpt 9 3.74 11.33 ReductionMaxFP16 1024 thrpt 9 3.86 9.59 ReductionMaxFP16 2048 thrpt 9 3.94 8.71 ReductionMinFP16 256 thrpt 9 4.78 10.00 ReductionMinFP16 512 thrpt 9 3.74 11.29 ReductionMinFP16 1024 thrpt 9 3.86 9.58 ReductionMinFP16 2048 thrpt 9 3.94 8.71 Testing: hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28828/files - new: https://git.openjdk.org/jdk/pull/28828/files/2f80bc4f..9971752e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=00-01 Stats: 17385 lines in 2438 files changed: 9261 ins; 2408 del; 5716 mod Patch: https://git.openjdk.org/jdk/pull/28828.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28828/head:pull/28828 PR: https://git.openjdk.org/jdk/pull/28828
