On Fri, 22 May 2026 13:04:06 GMT, Ferenc Rakoczi <[email protected]> wrote:
>> It might be better and clearer to use `bfm` rather that shifting, masking,
>> and ORing.
>
> Added the comments, but as for clarity of bfm, it is one less instruction,
> but to me it is not as intuitive as the shift and or.
@ferakocz you could use the `extr` instruction to do what you want here i.e.
// combine 40 + 12 bits into hi result
__ lsl(hi, hi, montMulP256Shift1);
__ lsr(tmp, lo, montMulP256Shift2);
__ orr(hi, hi, tmp);
// mask off 52 bits of lo result
__ andr(lo, lo, mask);
can be replaced with
// combine 40 + 12 bits into hi result
__ extr(hi, hi, low, montMulP256Shift2);
// mask off 52 bits of lo result
__ andr(lo, lo, mask);
That has the advantage of not requiring you to use `tmp`.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3414104096