On Fri, 5 Dec 2025 09:44:16 GMT, Bhavana Kilambi <[email protected]> wrote:

> This patch adds an SVE implementation of primitive array sorting 
> (Arrays.sort()) on AArch64 systems that support SVE. On non-SVE machines, we 
> fall back to the existing Java implementation.
> 
> For smaller arrays (length <= 64), we use insertion sort; for larger arrays 
> we use an SVE-vectorized quicksort partitioner followed by an odd-even 
> transposition cleanup pass.
> 
> The SVE path is enabled by default for int type. For float type, it is 
> available through the experimental flag :
> 
> `-XX:+UnlockExperimentalVMOptions -XX:+UseSVELibSimdSortForFP
> `
> Without this flag being enabled, the default Java implementation would be 
> executed for floats (the flag is disabled by default).
> 
> Float is gated due to observed regressions on some small/medium sizes. On 
> larger arrays, the SVE float path shows upto 1.47x speedup on Neoverse V2 and 
> 2.12x on Neoverse V1.
> 
> Following are the performance numbers for **ArraysSort JMH benchmark** -
> 
> **Case A:** Ratio between the scores of master branch and 
> `UseSVELibSimdSortForFP` flag disabled (which is the default).
> **Case B:** Ratio between the scores of master branch and 
> `UseSVELibSimdSortForFP` flag enabled (the int numbers will be the same but 
> this now enables SVE vectorized sorting for floats).
> **We would want the ratios to be >= 1 to be at par or better than the default 
> Java implementation (master branch).**
> 
> On Neoverse V1:
> 
> 
> Benchmark                       (size)   Mode    Cnt    A       B
> ArraysSort.floatParallelSort    10       avgt    3      0.98    0.98
> ArraysSort.floatParallelSort    25       avgt    3      1.01    0.83
> ArraysSort.floatParallelSort    50       avgt    3      0.99    0.55
> ArraysSort.floatParallelSort    75       avgt    3      0.99    0.66
> ArraysSort.floatParallelSort    100      avgt    3      0.98    0.66
> ArraysSort.floatParallelSort    1000     avgt    3      1.00    0.84
> ArraysSort.floatParallelSort    10000    avgt    3      1.03    1.52
> ArraysSort.floatParallelSort    100000   avgt    3      1.03    1.46
> ArraysSort.floatParallelSort    1000000  avgt    3      0.98    1.81
> ArraysSort.floatSort            10       avgt    3      1.00    0.98
> ArraysSort.floatSort            25       avgt    3      1.00    0.81
> ArraysSort.floatSort            50       avgt    3      0.99    0.56
> ArraysSort.floatSort            75       avgt    3      0.99    0.65
> ArraysSort.floatSort            100      avgt    3      0.98    0.70
> ArraysSort.floatSort            1000     avgt    3      0.99    0.84
> ArraysSort.floatSort            ...

make/modules/java.base/Lib.gmk line 225:

> 223: 
> 224:   TARGETS += $(BUILD_LIBSIMD_SORT)
> 225: endif

This whole block should be combined with the existing block above, something 
like this:


ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64 
aarch64)+$(INCLUDE_COMPILER2)+$(filter $(TOOLCHAIN_TYPE), gcc), 
true+true+true+gcc)
  ##############################################################################
  ## Build libsimdsort
  ##############################################################################

  $(eval $(call SetupJdkLibrary, BUILD_LIBSIMD_SORT, \
      NAME := simdsort, \
      LINK_TYPE := C++, \
      OPTIMIZATION := HIGH, \
      INCLUDES := $(OPENJDK_TARGET_CPU_ARCH), \
      CXXFLAGS := -std=c++17, \
      CXXFLAGS_linux_aarch64 := -march=armv8.2-a+sve, \
      DISABLED_WARNINGS_gcc := unused-variable, \
      LIBS_linux := $(LIBM), \
  ))

  TARGETS += $(BUILD_LIBSIMD_SORT)
endif


Unfortunately we don't currently support CXXFLAGS_<os>_<cpu>, just 
CFLAGS_<os>_<cpu>, but this can be fixed and I think it should be since we now 
have a need for it.


diff --git a/make/common/native/Flags.gmk b/make/common/native/Flags.gmk
index efb4c08e74c..2f3680af7c7 100644
--- a/make/common/native/Flags.gmk
+++ b/make/common/native/Flags.gmk
@@ -106,10 +106,12 @@ define SetupCompilerFlags
     $1_EXTRA_CFLAGS += -DSTATIC_BUILD=1
   endif
 
-  # Pickup extra OPENJDK_TARGET_OS_TYPE, OPENJDK_TARGET_OS and/or 
TOOLCHAIN_TYPE
-  # dependent variables for CXXFLAGS.
+  # Pickup extra OPENJDK_TARGET_OS_TYPE, OPENJDK_TARGET_OS, TOOLCHAIN_TYPE and
+  # OPENJDK_TARGET_OS plus OPENJDK_TARGET_CPU pair dependent variables for
+  # CXXFLAGS.
   $1_EXTRA_CXXFLAGS := $$($1_CXXFLAGS_$(OPENJDK_TARGET_OS_TYPE)) 
$$($1_CXXFLAGS_$(OPENJDK_TARGET_OS)) \
-      $$($1_CXXFLAGS_$(TOOLCHAIN_TYPE))
+      $$($1_CXXFLAGS_$(TOOLCHAIN_TYPE)) \
+      $$($1_CXXFLAGS_$(OPENJDK_TARGET_OS)_$(OPENJDK_TARGET_CPU))
 
   ifneq ($(DEBUG_LEVEL), release)
     # Pickup extra debug dependent variables for CXXFLAGS


The above at least compiles for me.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28675#discussion_r2593665862

Reply via email to