Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v5]

Ferenc Rakoczi Fri, 22 May 2026 06:15:34 -0700

On Tue, 19 May 2026 08:27:53 GMT, Andrew Haley <[email protected]> wrote:


>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7758:
>> 
>>> 7756:       __ lsr(tmp, lo, montMulP256Shift2);
>>> 7757:       __ orr(hi, hi, tmp);
>>> 7758:       __ andr(lo, lo, mask);
>> 
>> Suggestion:
>> 
>>       // compute 104-bit (40 + 64) full product
>>       __ umulh(hi, a, b);
>>       __ mul(lo, a, b);
>>       // combine 40 + 12 bits into hi result
>>       __ lsl(hi, hi, montMulP256Shift1);
>>       __ lsr(tmp, lo, montMulP256Shift2);
>>       __ orr(hi, hi, tmp);
>>       // mask off 52 bits of lo result
>>       __ andr(lo, lo, mask);
>
> It might be better and clearer to use `bfm` rather that shifting, masking, 
> and ORing.

Added the comments, but as for clarity of bfm, it is one less instruction, but 
to me it is not as intuitive as the shift and or.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3288539803

Re: RFR: 8355216: Accelerate P-256 arithmetic on aarch64 [v5]

Reply via email to