On Thu, 25 Jun 2026 11:05:30 GMT, Ehsan Behrangi <[email protected]> wrote:
> The current AArch64 implementation of ArraysSupport.vectorizedHashCode
> processes polynomial reductions in relatively small groups, which limits
> parallelism in the hash accumulation path for large arrays.
>
> This change increases polynomial batch size to 16-element groups using a
> larger precomputed powers-of-31 table. The updated implementation enables
> more independent multiply operations and reduces dependency chains in the
> main hashing loop.
>
> The optimization also reduces generated stub size for all supported element
> types, lowering instruction cache pressure in hot hashing workloads.
>
> The optimization applies to boolean[], byte[], char[], short[], and int[]
> array hashing paths and is enabled only for array lengths >= 8. Shorter
> arrays continue to use the existing scalar implementation.
>
> Generated stub size reduction:
>
>
> | Element type | New size | JDK 25 size | Reduction |
> | ------------ | -------- | ----------- | --------- |
> | boolean | 332 B | 428 B | -96 B |
> | byte | 332 B | 428 B | -96 B |
> | char | 332 B | 408 B | -76 B |
> | short | 332 B | 408 B | -76 B |
> | int | 300 B | 324 B | -24 B |
>
> ## BYTE[] Arrays.hashCode throughput (ops/ms):
> Lengths below 8 use the existing scalar path and are therefore expected to
> show no meaningful change.
>
> | Length | Baseline | New | Improvement |
> |--------|----------|--------|-------------|
> | 2 | 696842 | 681572 | -2.2% |
> | 7 | 349082 | 349392 | +0.1% |
> | 8 | 309193 | 395677 | +28.0% |
> | 9 | 294240 | 367510 | +24.9% |
> | 15 | 160372 | 202718 | +26.4% |
> | 16 | 241651 | 348854 | +44.4% |
> | 17 | 228929 | 308820 | +34.9% |
> | 23 | 139463 | 186679 | +33.9% |
> | 24 | 177955 | 253809 | +42.6% |
> | 25 | 173594 | 253786 | +46.2% |
> | 31 | 113638 | 159672 | +40.5% |
> | 32 | 164228 | 214765 | +30.8% |
> | 33 | 155093 | 199425 | +28.6% |
> | 47 | 103190 | 135190 | +31.0% |
> | 48 | 116600 | 145178 | +24.5% |
> | 49 | 112067 | 163144 | +45.6% |
> | 63 | 79978 | 116111 | +45.2% |
> | 64 | 104182 | 130175 | +25.0% |
> | 65 | 101735 | 125010 | +22.9% |
>
>
> ## CHAR[] Arrays.hashCode throughput (ops/ms)
>
> | Length | Baseline | New | Improvement |
> |--------|----------|--------|-------------|
> | 2 | 696254 | 696646 | +0.1% |
> | 7 |...
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9197:
> 9195: bool widen_signed = false;
> 9196:
> 9197: auto widen = [&](FloatRegister dst1,
Suggestion:
auto widen = [&_masm](FloatRegister dst1,
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3474152279