Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v5]

Andrew Dinn Tue, 19 May 2026 09:21:48 -0700

On Fri, 15 May 2026 09:52:20 GMT, Ferenc Rakoczi <[email protected]> wrote:


>> An aarch64 implementation of the MontgomeryIntegerPolynomial256.mult() 
>> method and IntegerPolynomial.conditionalAssign(). Since 64-bit 
>> multiplication is not supported on Neon and manually performing this 
>> operation with 32-bit limbs is slower than with GPRs, a hybrid neon/gpr 
>> approach is used. Neon instructions are used to compute intermediate values 
>> used in the last two iterations of the main "loop", while the GPRs compute 
>> the first few iterations. At the method level this improves performance by 
>> ~9% and at the API level roughly 5%.
>> 
>> 
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Ferenc Rakoczi has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Accepting more suggestions from Andrew Dinn.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8077:

> 8075:       + b_0 + b_1 + b_2;
> 8076:         b_0 = b_1 = b_2 = noreg;
> 8077: 

This freeing of registers would be better done at line 8002 (i.e. before 
processing a3) and with register b_3 also freed. It highlights to the 
maintainer that b0 - b3 are no longer needed from that point on.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8136:

> 8134:     __ add(high, high, mod_high);
> 8135: 
> 8136:     // Reallocate regs b_3, b_4

Likewise at this point we should be freeing `b_4` having already freed `b_3`.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8238:

> 8236: 
> 8237:     // End intrinsic call
> 8238:     __ add(sp, sp, 176);

This constant could be replaced with `C_DATA_SIZE + MUL_DATA_SIZE`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3267784068
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3267795849
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3267814052

Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v5]

Reply via email to