On Fri, 26 Jun 2026 15:08:36 GMT, Ehsan Behrangi <[email protected]> wrote:
>> The current AArch64 implementation of ArraysSupport.vectorizedHashCode >> processes polynomial reductions in relatively small groups, which limits >> parallelism in the hash accumulation path for large arrays. >> >> This change increases polynomial batch size to 16-element groups using a >> larger precomputed powers-of-31 table. The updated implementation enables >> more independent multiply operations and reduces dependency chains in the >> main hashing loop. >> >> The optimization also reduces generated stub size for all supported element >> types, lowering instruction cache pressure in hot hashing workloads. >> >> The optimization applies to boolean[], byte[], char[], short[], and int[] >> array hashing paths and is enabled only for array lengths >= 8. Shorter >> arrays continue to use the existing scalar implementation. >> >> Generated stub size reduction: >> >> >> | Element type | New size | JDK 25 size | Reduction | >> | ------------ | -------- | ----------- | --------- | >> | boolean | 332 B | 428 B | -96 B | >> | byte | 332 B | 428 B | -96 B | >> | char | 332 B | 408 B | -76 B | >> | short | 332 B | 408 B | -76 B | >> | int | 300 B | 324 B | -24 B | >> >> ## BYTE[] Arrays.hashCode throughput (ops/ms): >> Lengths below 8 use the existing scalar path and are therefore expected to >> show no meaningful change. >> >> | Length | Baseline | New | Improvement | >> |--------|----------|--------|-------------| >> | 2 | 696842 | 681572 | -2.2% | >> | 7 | 349082 | 349392 | +0.1% | >> | 8 | 309193 | 395677 | +28.0% | >> | 9 | 294240 | 367510 | +24.9% | >> | 15 | 160372 | 202718 | +26.4% | >> | 16 | 241651 | 348854 | +44.4% | >> | 17 | 228929 | 308820 | +34.9% | >> | 23 | 139463 | 186679 | +33.9% | >> | 24 | 177955 | 253809 | +42.6% | >> | 25 | 173594 | 253786 | +46.2% | >> | 31 | 113638 | 159672 | +40.5% | >> | 32 | 164228 | 214765 | +30.8% | >> | 33 | 155093 | 199425 | +28.6% | >> | 47 | 103190 | 135190 | +31.0% | >> | 48 | 116600 | 145178 | +24.5% | >> | 49 | 112067 | 163144 | +45.6% | >> | 63 | 79978 | 116111 | +45.2% | >> | 64 | 104182 | 130175 | +25.0% | >> | 65 | 101735 | 125010 | +22.9% | >> >> >> ## CHAR[] Arrays.hashCode throughput (ops/ms) >> >> | Length | Baseline | New | Improvement | >> |-... > > Ehsan Behrangi has refreshed the contents of this pull request, and previous > commits have been removed. The incremental views will show differences > compared to the previous content of the PR. The pull request contains one new > commit since the last revision: > > 8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for > large arrays > > The current AArch64 implementation of ArraysSupport.vectorizedHashCode > processes polynomial reductions in relatively small groups, which limits > parallelism in the hash accumulation path for large arrays. > > This change increases polynomial batch size to 16-element groups using a > larger precomputed powers-of-31 table. The updated implementation enables > more independent multiply operations and reduces dependency chains in the > main hashing loop. > > The optimization also reduces generated stub size for all supported > element types, lowering instruction cache pressure in hot hashing > workloads. > > The optimization applies to boolean[], byte[], char[], short[], and > int[] array hashing paths and is enabled only for array lengths >= 8. > Shorter arrays continue to use the existing scalar implementation. > > Generated stub size reduction: > | Element type | New size | JDK 25 size | Reduction | > | ------------ | -------- | ----------- | --------- | > | boolean | 332 B | 428 B | -96 B | > | byte | 332 B | 428 B | -96 B | > | char | 332 B | 408 B | -76 B | > | short | 332 B | 408 B | -76 B | > | int | 300 B | 324 B | -24 B | > > ---------------------------------------------------- > BYTE[] Arrays.hashCode throughput (ops/ms): > Lengths below 8 use the existing scalar path and are therefore expected to > show no meaningful change. > > | Length | Baseline | New | Improvement | > |--------|----------|--------|-------------| > | 2 | 696842 | 681572 | -2.2% | > | 7 | 349082 | 349392 | +0.1% | > | 8 | 309193 | 395677 | +28.0% | > | 9 | 294240 | 367510 | +24.9% | > | 15 | 160372 | 202718 | +26.4% | > | 16 | 241651 | 348854 | +44.4% | > | 17 | 228929 | 308820 | +34.9% | > | 23 | 139463 | 186679 | +33.9% | > | 24 | 177955 | 253809 | +42.6% | > | 25 | 173594 | 253786 | +46.2% | > | 31 | 113638 | 159672 | +40.5% | > | 32 | 164228 | 214765 | +30.8% | > | 33 | 155093 ... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9175: > 9173: > 9174: const Register tmp = rscratch1; > 9175: const Register pow16 = rscratch2; Please don't alias `rscratch1` and `rscratch2`. They are freely used by assembler macros. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3491844590
