On Fri, 15 May 2026 09:52:20 GMT, Ferenc Rakoczi <[email protected]> wrote:
>> An aarch64 implementation of the MontgomeryIntegerPolynomial256.mult() >> method and IntegerPolynomial.conditionalAssign(). Since 64-bit >> multiplication is not supported on Neon and manually performing this >> operation with 32-bit limbs is slower than with GPRs, a hybrid neon/gpr >> approach is used. Neon instructions are used to compute intermediate values >> used in the last two iterations of the main "loop", while the GPRs compute >> the first few iterations. At the method level this improves performance by >> ~9% and at the API level roughly 5%. >> >> >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Ferenc Rakoczi has updated the pull request incrementally with one additional > commit since the last revision: > > Accepting more suggestions from Andrew Dinn. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8267: > 8265: // P521OrderField: 19 = 8 + 8 + 2 + 1 > 8266: // Special Cases 5, 10, 14, 16, 19 > 8267: You need to insert the standard call `load_archive_data` here and return any stub that is found. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8568: > 8566: __ mov(r0, zr); // return 0 > 8567: __ ret(lr); > 8568: return start; You need to call `store_archive_data` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3275486612 PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3275489536
