This patch adds an SVE implementation of primitive array sorting 
(Arrays.sort()) on AArch64 systems that support SVE. On non-SVE machines, we 
fall back to the existing Java implementation.

For smaller arrays (length <= 64), we use insertion sort; for larger arrays we 
use an SVE-vectorized quicksort partitioner followed by an odd-even 
transposition cleanup pass.

The SVE path is enabled by default for int type. For float type, it is 
available through the experimental flag :

`-XX:+UnlockExperimentalVMOptions -XX:+UseSVELibSimdSortForFP
`
Without this flag being enabled, the default Java implementation would be 
executed for floats (the flag is disabled by default).

Float is gated due to observed regressions on some small/medium sizes. On 
larger arrays, the SVE float path shows upto 1.47x speedup on Neoverse V2 and 
2.12x on Neoverse V1.

Following are the performance numbers for **ArraysSort JMH benchmark** -

**Case A:** Ratio between the scores of master branch and 
`UseSVELibSimdSortForFP` flag disabled (which is the default).
**Case B:** Ratio between the scores of master branch and 
`UseSVELibSimdSortForFP` flag enabled (the int numbers will be the same but 
this now enables SVE vectorized sorting for floats).
**We would want the ratios to be >= 1 to be at par or better than the default 
Java implementation (master branch).**

On Neoverse V1:


Benchmark                       (size)   Mode    Cnt    A       B
ArraysSort.floatParallelSort    10       avgt    3      0.98    0.98
ArraysSort.floatParallelSort    25       avgt    3      1.01    0.83
ArraysSort.floatParallelSort    50       avgt    3      0.99    0.55
ArraysSort.floatParallelSort    75       avgt    3      0.99    0.66
ArraysSort.floatParallelSort    100      avgt    3      0.98    0.66
ArraysSort.floatParallelSort    1000     avgt    3      1.00    0.84
ArraysSort.floatParallelSort    10000    avgt    3      1.03    1.52
ArraysSort.floatParallelSort    100000   avgt    3      1.03    1.46
ArraysSort.floatParallelSort    1000000  avgt    3      0.98    1.81
ArraysSort.floatSort            10       avgt    3      1.00    0.98
ArraysSort.floatSort            25       avgt    3      1.00    0.81
ArraysSort.floatSort            50       avgt    3      0.99    0.56
ArraysSort.floatSort            75       avgt    3      0.99    0.65
ArraysSort.floatSort            100      avgt    3      0.98    0.70
ArraysSort.floatSort            1000     avgt    3      0.99    0.84
ArraysSort.floatSort            10000    avgt    3      0.99    1.72
ArraysSort.floatSort            100000   avgt    3      1.00    1.94
ArraysSort.floatSort            1000000  avgt    3      1.00    2.13
ArraysSort.intParallelSort      10       avgt    3      1.08    1.08
ArraysSort.intParallelSort      25       avgt    3      1.04    1.05
ArraysSort.intParallelSort      50       avgt    3      1.29    1.30
ArraysSort.intParallelSort      75       avgt    3      1.16    1.16
ArraysSort.intParallelSort      100      avgt    3      1.07    1.07
ArraysSort.intParallelSort      1000     avgt    3      1.13    1.13
ArraysSort.intParallelSort      10000    avgt    3      1.49    1.38
ArraysSort.intParallelSort      100000   avgt    3      1.64    1.62
ArraysSort.intParallelSort      1000000  avgt    3      2.26    2.27
ArraysSort.intSort              10       avgt    3      1.08    1.08
ArraysSort.intSort              25       avgt    3      1.02    1.02
ArraysSort.intSort              50       avgt    3      1.25    1.25
ArraysSort.intSort              75       avgt    3      1.16    1.20
ArraysSort.intSort              100      avgt    3      1.07    1.07
ArraysSort.intSort              1000     avgt    3      1.12    1.13
ArraysSort.intSort              10000    avgt    3      1.94    1.95
ArraysSort.intSort              100000   avgt    3      1.86    1.86
ArraysSort.intSort              1000000  avgt    3      2.09    2.09


On Neoverse V2:


Benchmark                       (size)   Mode    Cnt    A       B
ArraysSort.floatParallelSort    10       avgt    3      1.02    1.02
ArraysSort.floatParallelSort    25       avgt    3      0.97    0.71
ArraysSort.floatParallelSort    50       avgt    3      0.94    0.65
ArraysSort.floatParallelSort    75       avgt    3      0.96    0.82
ArraysSort.floatParallelSort    100      avgt    3      0.95    0.84
ArraysSort.floatParallelSort    1000     avgt    3      1.01    0.94
ArraysSort.floatParallelSort    10000    avgt    3      1.01    1.25
ArraysSort.floatParallelSort    100000   avgt    3      1.01    1.09
ArraysSort.floatParallelSort    1000000  avgt    3      1.00    1.10
ArraysSort.floatSort            10       avgt    3      1.02    1.00
ArraysSort.floatSort            25       avgt    3      0.99    0.76
ArraysSort.floatSort            50       avgt    3      0.97    0.66
ArraysSort.floatSort            75       avgt    3      1.01    0.83
ArraysSort.floatSort            100      avgt    3      1.00    0.85
ArraysSort.floatSort            1000     avgt    3      0.99    0.93
ArraysSort.floatSort            10000    avgt    3      1.00    1.28
ArraysSort.floatSort            100000   avgt    3      1.00    1.37
ArraysSort.floatSort            1000000  avgt    3      1.00    1.48
ArraysSort.intParallelSort      10       avgt    3      1.05    1.05
ArraysSort.intParallelSort      25       avgt    3      0.99    0.84
ArraysSort.intParallelSort      50       avgt    3      1.03    1.14
ArraysSort.intParallelSort      75       avgt    3      0.91    0.99
ArraysSort.intParallelSort      100      avgt    3      0.98    0.96
ArraysSort.intParallelSort      1000     avgt    3      1.32    1.30
ArraysSort.intParallelSort      10000    avgt    3      1.40    1.40
ArraysSort.intParallelSort      100000   avgt    3      1.00    1.04
ArraysSort.intParallelSort      1000000  avgt    3      1.15    1.14
ArraysSort.intSort              10       avgt    3      1.05    1.05
ArraysSort.intSort              25       avgt    3      1.03    1.03
ArraysSort.intSort              50       avgt    3      1.08    1.14
ArraysSort.intSort              75       avgt    3      0.88    0.98
ArraysSort.intSort              100      avgt    3      1.01    0.99
ArraysSort.intSort              1000     avgt    3      1.3     1.32
ArraysSort.intSort              10000    avgt    3      1.43    1.43
ArraysSort.intSort              100000   avgt    3      1.30    1.30
ArraysSort.intSort              1000000  avgt    3      1.37    1.37

-------------

Commit messages:
 - AArch64 SVE implementation for Arrays.sort
 - Prepare base for SVE implementation in libsimdsort

Changes: https://git.openjdk.org/jdk/pull/28675/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28675&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8371711
  Stats: 1109 lines in 26 files changed: 1092 ins; 12 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/28675.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/28675/head:pull/28675

PR: https://git.openjdk.org/jdk/pull/28675

Reply via email to