On Fri, 15 May 2026 09:52:20 GMT, Ferenc Rakoczi <[email protected]> wrote:

>> An aarch64 implementation of the MontgomeryIntegerPolynomial256.mult() 
>> method and IntegerPolynomial.conditionalAssign(). Since 64-bit 
>> multiplication is not supported on Neon and manually performing this 
>> operation with 32-bit limbs is slower than with GPRs, a hybrid neon/gpr 
>> approach is used. Neon instructions are used to compute intermediate values 
>> used in the last two iterations of the main "loop", while the GPRs compute 
>> the first few iterations. At the method level this improves performance by 
>> ~9% and at the API level roughly 5%.
>> 
>> 
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Ferenc Rakoczi has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Accepting more suggestions from Andrew Dinn.

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3166:

> 3164:     assert(Ts == H ? Vm->encoding() < 16 : Vm->encoding() < 32, 
> "umull{2}v requires Vm to be in range V0..V15 when Ts is H");
> 3165:     f(0, 31), f(q, 30), f(0b101111, 29, 24), f(size, 23, 22), f(l, 21); 
> //f(m, 20);
> 3166:     rf(Vm, 16), f(0b1010, 15, 12), f(h, 11), f(0, 10), rf(Vn, 5), 
> rf(Vd, 0);

Why `f(m, 20)` is commented here? Does it need to be set?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3275515091

Reply via email to