On Wed, 13 May 2026 16:11:52 GMT, Ferenc Rakoczi <[email protected]> wrote:
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7838:
>>
>>> 7836: __ lsr(tmp, low, shift2);
>>> 7837: __ orr(high, high, tmp);
>>> 7838: __ andr(low, low, limb_mask);
>>
>> This is a recurring pattern:
>>
>> __ umulh(hi, a, b);
>> __ mul(lo, a, b)
>> __ lsl(hi, hi, SHIFT1)
>> __ lsr(tmp, lo, SHIFT2)
>> __ orr(hi, hi, tmp)
>> __ andr(lo, lo, mask)
>>
>> which could be abstracted as a macro method (picking an arbitrary name that
>> you probably want to improve on):
>>
>> p256_partial_mul(Register a, Register b, Register hi, Register lo, Register
>> tmp, Register mask)
>>
>> You can then simplify the code that processes this limb (likewise in each
>> each subsequent limb) to make it clearer what is being done to combine the
>> results of these macro computations:
>>
>> __ ldr(a_i, __ post(a, 8));
>>
>> p256_partial_mul(a_i, b_0, high, low, tmp, limb_mask);
>>
>> __ andr(n, low, limb_mask);
>>
>> neon_partial_mult_64(B, b_highs, a_vals, 0);
>>
>> p256_partial_mul(n, mod_0, mod_high, mod_low, tmp, limb_mask)
>>
>> __ add(low, low, mod_low);
>> __ add(high, high, mod_high);
>> __ lsr(c_i, low, shift2);
>> __ add(c_i, c_i, high);
>>
>> Also, note that the function consumes SHIFT1 and SHIFT2 which should be
>> defined as final int constants and would be better defined at file scope
>> rather than being declared and initialized as local variables.
>
> Very good idea! Thanks a lot!
Done.
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7958:
>>
>>> 7956: __ st1(D[2], __ T2D, __ post(mul_ptr, 16));
>>> 7957: __ st1(A[3], __ T2D, __ post(mul_ptr, 16));
>>> 7958: __ st1(D[3], __ T2D, mul_ptr);
>>
>> You could usefully abstract this as a VSeq template function
>>
>> vs_st1_interleaved(VSeq<N> A, VSeq<N> B, Register dest) {
>> for (int i = 0; i < N; i++) {
>> __ st1(A[i], __ T2D, __ post(dest, 16));
>> __ st1(B[i], __ T2D, __ post(dest, 16));
>> }
>> }
>
> Good idea. Thanks!
Done.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3247453451
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3247452141