On Fri, 15 May 2026 09:52:20 GMT, Ferenc Rakoczi <[email protected]> wrote:
>> An aarch64 implementation of the MontgomeryIntegerPolynomial256.mult() >> method and IntegerPolynomial.conditionalAssign(). Since 64-bit >> multiplication is not supported on Neon and manually performing this >> operation with 32-bit limbs is slower than with GPRs, a hybrid neon/gpr >> approach is used. Neon instructions are used to compute intermediate values >> used in the last two iterations of the main "loop", while the GPRs compute >> the first few iterations. At the method level this improves performance by >> ~9% and at the API level roughly 5%. >> >> >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Ferenc Rakoczi has updated the pull request incrementally with one additional > commit since the last revision: > > Accepting more suggestions from Andrew Dinn. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8077: > 8075: + b_0 + b_1 + b_2; > 8076: b_0 = b_1 = b_2 = noreg; > 8077: This freeing of registers would be better done at line 8002 (i.e. before processing a3) and with register b_3 also freed. It highlights to the maintainer that b0 - b3 are no longer needed from that point on. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8136: > 8134: __ add(high, high, mod_high); > 8135: > 8136: // Reallocate regs b_3, b_4 Likewise at this point we should be freeing `b_4` having already freed `b_3`. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8238: > 8236: > 8237: // End intrinsic call > 8238: __ add(sp, sp, 176); This constant could be replaced with `C_DATA_SIZE + MUL_DATA_SIZE` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3267784068 PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3267795849 PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3267814052
