Re: RFR: 8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for large arrays [v2]

Andrew Haley Mon, 29 Jun 2026 06:01:22 -0700

On Fri, 26 Jun 2026 15:08:36 GMT, Ehsan Behrangi <[email protected]> wrote:


>> The current AArch64 implementation of ArraysSupport.vectorizedHashCode 
>> processes polynomial reductions in relatively small groups, which limits 
>> parallelism in the hash accumulation path for large arrays.
>> 
>> This change increases polynomial batch size to 16-element groups using a 
>> larger precomputed powers-of-31 table. The updated implementation enables 
>> more independent multiply operations and reduces dependency chains in the 
>> main hashing loop.
>> 
>> The optimization also reduces generated stub size for all supported element 
>> types, lowering instruction cache pressure in hot hashing workloads.
>> 
>> The optimization applies to boolean[], byte[], char[], short[], and int[] 
>> array hashing paths and is enabled only for array lengths >= 8. Shorter 
>> arrays continue to use the existing scalar implementation.
>> 
>> Generated stub size reduction:
>> 
>> 
>> | Element type | New size | JDK 25 size | Reduction | 
>> | ------------ | -------- | ----------- | --------- |
>> | boolean      | 332 B    | 428 B       | -96 B     |
>> | byte         | 332 B    | 428 B       | -96 B     |
>> | char         | 332 B    | 408 B       | -76 B     |
>> | short        | 332 B    | 408 B       | -76 B     |
>> | int          | 300 B    | 324 B       | -24 B     |
>> 
>> ## BYTE[] Arrays.hashCode throughput (ops/ms):
>> Lengths below 8 use the existing scalar path and are therefore expected to 
>> show no meaningful change.
>> 
>> | Length | Baseline | New    | Improvement |
>> |--------|----------|--------|-------------|
>> | 2      | 696842   | 681572 | -2.2%       |
>> | 7      | 349082   | 349392 | +0.1%       |
>> | 8      | 309193   | 395677 | +28.0%      |
>> | 9      | 294240   | 367510 | +24.9%      |
>> | 15     | 160372   | 202718 | +26.4%      |
>> | 16     | 241651   | 348854 | +44.4%      |
>> | 17     | 228929   | 308820 | +34.9%      |
>> | 23     | 139463   | 186679 | +33.9%      |
>> | 24     | 177955   | 253809 | +42.6%      |
>> | 25     | 173594   | 253786 | +46.2%      |
>> | 31     | 113638   | 159672 | +40.5%      |
>> | 32     | 164228   | 214765 | +30.8%      |
>> | 33     | 155093   | 199425 | +28.6%      |
>> | 47     | 103190   | 135190 | +31.0%      |
>> | 48     | 116600   | 145178 | +24.5%      |
>> | 49     | 112067   | 163144 | +45.6%      |
>> | 63     | 79978    | 116111 | +45.2%      |
>> | 64     | 104182   | 130175 | +25.0%      |
>> | 65     | 101735   | 125010 | +22.9%      |
>> 
>> 
>> ## CHAR[] Arrays.hashCode throughput (ops/ms)
>> 
>> | Length | Baseline | New    | Improvement |
>> |-...
>
> Ehsan Behrangi has refreshed the contents of this pull request, and previous 
> commits have been removed. The incremental views will show differences 
> compared to the previous content of the PR. The pull request contains one new 
> commit since the last revision:
> 
>   8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for 
> large arrays
>   
>   The current AArch64 implementation of ArraysSupport.vectorizedHashCode
>   processes polynomial reductions in relatively small groups, which limits
>   parallelism in the hash accumulation path for large arrays.
>   
>   This change increases polynomial batch size to 16-element groups using a
>   larger precomputed powers-of-31 table. The updated implementation enables
>   more independent multiply operations and reduces dependency chains in the
>   main hashing loop.
>   
>   The optimization also reduces generated stub size for all supported
>   element types, lowering instruction cache pressure in hot hashing
>   workloads.
>   
>   The optimization applies to boolean[], byte[], char[], short[], and
>   int[] array hashing paths and is enabled only for array lengths >= 8.
>   Shorter arrays continue to use the existing scalar implementation.
>   
>   Generated stub size reduction:
>   | Element type | New size | JDK 25 size | Reduction |
>   | ------------ | -------- | ----------- | --------- |
>   | boolean      | 332 B    | 428 B       | -96 B     |
>   | byte         | 332 B    | 428 B       | -96 B     |
>   | char         | 332 B    | 408 B       | -76 B     |
>   | short        | 332 B    | 408 B       | -76 B     |
>   | int          | 300 B    | 324 B       | -24 B     |
>   
>   ----------------------------------------------------
>   BYTE[] Arrays.hashCode throughput (ops/ms):
>   Lengths below 8 use the existing scalar path and are therefore expected to 
> show no meaningful change.
>   
>   | Length | Baseline | New    | Improvement |
>   |--------|----------|--------|-------------|
>   | 2      | 696842   | 681572 | -2.2%       |
>   | 7      | 349082   | 349392 | +0.1%       |
>   | 8      | 309193   | 395677 | +28.0%      |
>   | 9      | 294240   | 367510 | +24.9%      |
>   | 15     | 160372   | 202718 | +26.4%      |
>   | 16     | 241651   | 348854 | +44.4%      |
>   | 17     | 228929   | 308820 | +34.9%      |
>   | 23     | 139463   | 186679 | +33.9%      |
>   | 24     | 177955   | 253809 | +42.6%      |
>   | 25     | 173594   | 253786 | +46.2%      |
>   | 31     | 113638   | 159672 | +40.5%      |
>   | 32     | 164228   | 214765 | +30.8%      |
>   | 33     | 155093   ...

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9175:

> 9173: 
> 9174:     const Register tmp    = rscratch1;
> 9175:     const Register pow16  = rscratch2;

Please don't alias `rscratch1` and `rscratch2`. They are freely used by 
assembler macros.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3491844590

Re: RFR: 8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for large arrays [v2]

Reply via email to