> An aarch64 implementation of the MontgomeryIntegerPolynomial256.mult() method 
> and IntegerPolynomial.conditionalAssign(). Since 64-bit multiplication is not 
> supported on Neon and manually performing this operation with 32-bit limbs is 
> slower than with GPRs, a hybrid neon/gpr approach is used. Neon instructions 
> are used to compute intermediate values used in the last two iterations of 
> the main "loop", while the GPRs compute the first few iterations. At the 
> method level this improves performance by ~9% and at the API level roughly 5%.
> 
> 
> 
> ---------
> - [x] I confirm that I make this contribution in accordance with the [OpenJDK 
> Interim AI Policy](https://openjdk.org/legal/ai).

Ferenc Rakoczi has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now contains 11 commits:

 - Merge branch 'master' into p256-aarch64
 - Added UseIntPolyIntrinsics to aotCodeCache.hpp
 - Added some comments
 - Accepting more suggestions from Anrew Dinn.
 - Addressing more review comments.
 - Accepting more suggestions from Andrew Dinn.
 - Added AOT Code Cache related code + some cosmetic changes
 - Accepting very good advise from Andrew Dinn.
 - Merged master.
 - Removing a jar file.
 - ... and 1 more: https://git.openjdk.org/jdk/compare/39d2d165...2c244066

-------------

Changes: https://git.openjdk.org/jdk/pull/30941/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=30941&range=09
  Stats: 1032 lines in 6 files changed: 1030 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/30941.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/30941/head:pull/30941

PR: https://git.openjdk.org/jdk/pull/30941

Reply via email to