> This change improves the AArch64 implementation of String.equals by > introducing SIMD-based fast paths using SVE and NEON. > > SVE implementation: > - Uses predicated loads and comparisons for short lengths (len < VL) > - Uses a full predicated loop for longer inputs > - Handles the tail via an overlapped compare at (base + len - VL) > > NEON implementation: > - Uses an 8-byte pre-read to simplify tail handling and eliminate 4/2/1-byte > scalar branches > - Processes 16-byte chunks using LDP pair loads > - Uses CMP/CCMP to collapse comparisons into a single branch on mismatch > > These changes reduce branch pressure and improve throughput for both short > and long strings. > > Correctness: > - The implementation preserves existing semantics and matches behavior for > all lengths > > Testing: > - Updated and extended intrinsic tests to cover boundary conditions and > mismatch positions > > Benchmark: > Across evaluated macrobenchmarks (DaCapo and Renaissance), most workloads > spend <0.5% of CPU time in String.equals. DaCapo biojava is a notable > exception (~8–9%). In biojava, most String.equals calls are on very short > strings (1–2 bytes), where SVE shows ~1% end-to-end improvement, while NEON > is largely neutral or shows a small regression (~1%). > > Measured using JMH on AArch64 (Arm Neoverse V2 CPU). Values are relative (%) > vs baseline. Negative values indicate regressions. Mismatch results are > reported across first(DF), middle(DM), and last(DL) difference positions. > > SVE results: > > Length | L1_EQ L1_DF L1_DM L1_DL | U16_EQ U16_DF U16_DM U16_DL | Avg > -------+----------------------------+-----------------------------+------ > 0 | 19.63 | 20.05 | 19.84 > 1 | 16.59 17.81 16.57 18.34 | 16.02 0.71 0.42 1.39 | 10.98 > 2 | 16.44 1.32 0.30 -0.16 | 15.90 -5.17 -4.55 -1.09 | 2.87 > 3 | 26.58 1.60 1.43 27.07 | 30.34 -8.86 -7.06 14.08 | 10.65 > 7 | 41.47 -2.94 -3.37 39.82 | 24.02 -8.82 -6.27 20.48 | 13.05 > 8 | 19.08 -1.16 -3.50 -0.90 | 22.49 -9.75 17.50 13.13 | 7.11 > 9 | 20.17 -4.12 -5.17 19.03 | 9.25 -2.24 21.35 3.39 | 7.71 > 15 | 19.48 -3.83 -4.50 19.01 | 29.26 -10.06 11.76 17.07 | 9.77 > 16 | 19.04 -3.15 16.41 16.85 | 38.37 -11.12 13.18 27.70 | 14.66 > 17 | 8.95 -2.40 5.68 6.38 | 16.32 -1.61 7.49 11.44 | 6.53 > 31 | 28.87 -0.01 19.79 23.37 | 41.43 -7.57 23.85 35.89 | 20.70 > 32 | 32.58 3.38 12.39 26.90 | 46.01 -10.99 20.53 44.15 | 21.87 > 33 | 11.62 -15.20...
Ehsan Behrangi has updated the pull request incrementally with one additional commit since the last revision: 8381560: Relax cnt and result register constraints for SVE String.equals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/31400/files - new: https://git.openjdk.org/jdk/pull/31400/files/00cc9be8..0f585e35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=31400&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=31400&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/31400.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/31400/head:pull/31400 PR: https://git.openjdk.org/jdk/pull/31400
