On Fri, 5 Dec 2025 09:44:16 GMT, Bhavana Kilambi <[email protected]> wrote:
> This patch adds an SVE implementation of primitive array sorting
> (Arrays.sort()) on AArch64 systems that support SVE. On non-SVE machines, we
> fall back to the existing Java implementation.
>
> For smaller arrays (length <= 64), we use insertion sort; for larger arrays
> we use an SVE-vectorized quicksort partitioner followed by an odd-even
> transposition cleanup pass.
>
> The SVE path is enabled by default for int type. For float type, it is
> available through the experimental flag :
>
> `-XX:+UnlockExperimentalVMOptions -XX:+UseSVELibSimdSortForFP
> `
> Without this flag being enabled, the default Java implementation would be
> executed for floats (the flag is disabled by default).
>
> Float is gated due to observed regressions on some small/medium sizes. On
> larger arrays, the SVE float path shows upto 1.47x speedup on Neoverse V2 and
> 2.12x on Neoverse V1.
>
> Following are the performance numbers for **ArraysSort JMH benchmark** -
>
> **Case A:** Ratio between the scores of master branch and
> `UseSVELibSimdSortForFP` flag disabled (which is the default).
> **Case B:** Ratio between the scores of master branch and
> `UseSVELibSimdSortForFP` flag enabled (the int numbers will be the same but
> this now enables SVE vectorized sorting for floats).
> **We would want the ratios to be >= 1 to be at par or better than the default
> Java implementation (master branch).**
>
> On Neoverse V1:
>
>
> Benchmark (size) Mode Cnt A B
> ArraysSort.floatParallelSort 10 avgt 3 0.98 0.98
> ArraysSort.floatParallelSort 25 avgt 3 1.01 0.83
> ArraysSort.floatParallelSort 50 avgt 3 0.99 0.55
> ArraysSort.floatParallelSort 75 avgt 3 0.99 0.66
> ArraysSort.floatParallelSort 100 avgt 3 0.98 0.66
> ArraysSort.floatParallelSort 1000 avgt 3 1.00 0.84
> ArraysSort.floatParallelSort 10000 avgt 3 1.03 1.52
> ArraysSort.floatParallelSort 100000 avgt 3 1.03 1.46
> ArraysSort.floatParallelSort 1000000 avgt 3 0.98 1.81
> ArraysSort.floatSort 10 avgt 3 1.00 0.98
> ArraysSort.floatSort 25 avgt 3 1.00 0.81
> ArraysSort.floatSort 50 avgt 3 0.99 0.56
> ArraysSort.floatSort 75 avgt 3 0.99 0.65
> ArraysSort.floatSort 100 avgt 3 0.98 0.70
> ArraysSort.floatSort 1000 avgt 3 0.99 0.84
> ArraysSort.floatSort ...
make/modules/java.base/Lib.gmk line 225:
> 223:
> 224: TARGETS += $(BUILD_LIBSIMD_SORT)
> 225: endif
This whole block should be combined with the existing block above, something
like this:
ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64
aarch64)+$(INCLUDE_COMPILER2)+$(filter $(TOOLCHAIN_TYPE), gcc),
true+true+true+gcc)
##############################################################################
## Build libsimdsort
##############################################################################
$(eval $(call SetupJdkLibrary, BUILD_LIBSIMD_SORT, \
NAME := simdsort, \
LINK_TYPE := C++, \
OPTIMIZATION := HIGH, \
INCLUDES := $(OPENJDK_TARGET_CPU_ARCH), \
CXXFLAGS := -std=c++17, \
CXXFLAGS_linux_aarch64 := -march=armv8.2-a+sve, \
DISABLED_WARNINGS_gcc := unused-variable, \
LIBS_linux := $(LIBM), \
))
TARGETS += $(BUILD_LIBSIMD_SORT)
endif
Unfortunately we don't currently support CXXFLAGS_<os>_<cpu>, just
CFLAGS_<os>_<cpu>, but this can be fixed and I think it should be since we now
have a need for it.
diff --git a/make/common/native/Flags.gmk b/make/common/native/Flags.gmk
index efb4c08e74c..2f3680af7c7 100644
--- a/make/common/native/Flags.gmk
+++ b/make/common/native/Flags.gmk
@@ -106,10 +106,12 @@ define SetupCompilerFlags
$1_EXTRA_CFLAGS += -DSTATIC_BUILD=1
endif
- # Pickup extra OPENJDK_TARGET_OS_TYPE, OPENJDK_TARGET_OS and/or
TOOLCHAIN_TYPE
- # dependent variables for CXXFLAGS.
+ # Pickup extra OPENJDK_TARGET_OS_TYPE, OPENJDK_TARGET_OS, TOOLCHAIN_TYPE and
+ # OPENJDK_TARGET_OS plus OPENJDK_TARGET_CPU pair dependent variables for
+ # CXXFLAGS.
$1_EXTRA_CXXFLAGS := $$($1_CXXFLAGS_$(OPENJDK_TARGET_OS_TYPE))
$$($1_CXXFLAGS_$(OPENJDK_TARGET_OS)) \
- $$($1_CXXFLAGS_$(TOOLCHAIN_TYPE))
+ $$($1_CXXFLAGS_$(TOOLCHAIN_TYPE)) \
+ $$($1_CXXFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU))
ifneq ($(DEBUG_LEVEL), release)
# Pickup extra debug dependent variables for CXXFLAGS
The above at least compiles for me.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28675#discussion_r2593665862