On Fri, 19 Jun 2026 17:48:50 GMT, Sergey Bylokhov <[email protected]> wrote:
>> Thanks @adinn and @theRealAph! Could one of you also sponsor it? > > Hi @ferakocz, did you have a chance to run > test/micro/org/openjdk/bench/javax/crypto/full/PolynomialP256Bench.java on > this patch? >>.../build/patched/images/jdk/bin/java -jar >>.../build/patched/images/test/micro/benchmarks.jar >>org.openjdk.bench.javax.crypto.full.PolynomialP256Bench.benchAssign -p >>isMontBench=true > > I got these numbers on my local laptop macOS on m4: > Patched: >>PolynomialP256Bench.benchAssign true thrpt 8 10230.113 ± >>146.263 ops/s > > Baseline: >>PolynomialP256Bench.benchAssign true thrpt 8 23548.039 ± >>1596.303 ops/s @mrserb I am also seeing a slowdown for this specific micro-benchhmark on a fedora M2 Mac: Baseline: > PolynomialP256Bench.benchAssign true thrpt 8 14774.689 ± 1764.136 > ops/s Patched: > PolynomialP256Bench.benchAssign true thrpt 8 8171.365 ± 135.887 ops/s The benchMultiply and benchSquare micro-benchmarks both show an improvement Baseline: > PolynomialP256Bench.benchMultiply true thrpt 8 2624.022 ± > 1.985 ops/s > PolynomialP256Bench.benchSquare true thrpt 8 2629.698 ± > 3.645 ops/s Patched: > PolynomialP256Bench.benchMultiply true thrpt 8 3200.923 ± > 3.748 ops/s > PolynomialP256Bench.benchSquare true thrpt 8 3203.488 ± > 3.074 ops/s @ferakocz I'm not sure we should automatically trust this benchmark run in isolation -- it is most important to gauge what effect the use of the multiply and assign intrinsics has when exercising the P256 API. The micro-benchmark result does suggest that the intrinsification of conditionalAssign may not always help on AArch64. However, it might still be the case that when employed in combination with the multiply intrinsic it is of benefit - possibly also depending on what hardware we are running on. Your API/method level testing showed an improvement of 9% at the method level and 5% at the API level. Have you also run these tests on your M1 machine with the intrinsic for conditionalAssign omitted? If so what was the effect? If not then could you do so and let us know what difference it makes. If you provide details of the tests run and how to exercise them I will happily check what the effect is on my M2 box if I disable generation of the conditionalAssign intrinsic. Perhaps @mrserb can do the same on his M4 Mac. Depending on the outcome might also want to check this on other AArch64 CPUs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/30941#issuecomment-4766910504
