On Sat, 6 Jun 2026 09:50:42 GMT, Andrew Haley <[email protected]> wrote:
>> This change improves the AArch64 implementation of String.equals by >> introducing SIMD-based fast paths using SVE and NEON. >> >> SVE implementation: >> - Uses predicated loads and comparisons for short lengths (len < VL) >> - Uses a full predicated loop for longer inputs >> - Handles the tail via an overlapped compare at (base + len - VL) >> >> NEON implementation: >> - Uses an 8-byte pre-read to simplify tail handling and eliminate 4/2/1-byte >> scalar branches >> - Processes 16-byte chunks using LDP pair loads >> - Uses CMP/CCMP to collapse comparisons into a single branch on mismatch >> >> These changes reduce branch pressure and improve throughput for both short >> and long strings. >> >> Correctness: >> - The implementation preserves existing semantics and matches behavior for >> all lengths >> >> Testing: >> - Updated and extended intrinsic tests to cover boundary conditions and >> mismatch positions >> >> Benchmark: >> Across evaluated macrobenchmarks (DaCapo and Renaissance), most workloads >> spend <0.5% of CPU time in String.equals. DaCapo biojava is a notable >> exception (~8–9%). In biojava, most String.equals calls are on very short >> strings (1–2 bytes), where SVE shows ~1% end-to-end improvement, while NEON >> is largely neutral or shows a small regression (~1%). >> >> Measured using JMH on AArch64 (Arm Neoverse V2 CPU). Values are relative (%) >> vs baseline. Negative values indicate regressions. Mismatch results are >> reported across first(DF), middle(DM), and last(DL) difference positions. >> >> SVE results: >> >> Length | L1_EQ L1_DF L1_DM L1_DL | U16_EQ U16_DF U16_DM U16_DL | Avg >> -------+----------------------------+-----------------------------+------ >> 0 | 19.63 | 20.05 | 19.84 >> 1 | 16.59 17.81 16.57 18.34 | 16.02 0.71 0.42 1.39 | 10.98 >> 2 | 16.44 1.32 0.30 -0.16 | 15.90 -5.17 -4.55 -1.09 | 2.87 >> 3 | 26.58 1.60 1.43 27.07 | 30.34 -8.86 -7.06 14.08 | 10.65 >> 7 | 41.47 -2.94 -3.37 39.82 | 24.02 -8.82 -6.27 20.48 | 13.05 >> 8 | 19.08 -1.16 -3.50 -0.90 | 22.49 -9.75 17.50 13.13 | 7.11 >> 9 | 20.17 -4.12 -5.17 19.03 | 9.25 -2.24 21.35 3.39 | 7.71 >> 15 | 19.48 -3.83 -4.50 19.01 | 29.26 -10.06 11.76 17.07 | 9.77 >> 16 | 19.04 -3.15 16.41 16.85 | 38.37 -11.12 13.18 27.70 | 14.66 >> 17 | 8.95 -2.40 5.68 6.38 | 16.32 -1.61 7.49 11.44 | 6.53 >> 31 | 28.87 -0.01 19.79 23.37 | 41.43 -7.57 23.85 35.89 | 20.70 >> 32 | 32.58... > > src/hotspot/cpu/aarch64/aarch64.ad line 16035: > >> 16033: iRegP_R3 str2, // str2 (kill) >> 16034: iRegI_R4 cnt, // int length (kill) >> 16035: iRegI_R0 result, // boolean > > From what I can see here these don't need to be fixed registers. The fixed scalar operands are inherited from the existing string_equalsL rule. Are you suggesting that they should be relaxed for the SVE variant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31400#discussion_r3368043884
