On Wed, 19 Jan 2022 17:38:25 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> | | BASELINE AVX2 | WithOpt AVX2 | Gain (opt/baseline) | Baseline AVX3 | >> Withopt AVX3 | Gain (opt/baseline) >> -- | -- | -- | -- | -- | -- | -- | -- >> Benchmark | ARRAYLEN | Score (ops/ms) | Score (ops/ms) | | Score (ops/ms) >> | Score (ops/ms) | >> FpRoundingBenchmark.test_round_double | 1024 | 518.532 | 1364.066 | >> 2.630630318 | 512.908 | 4292.11 | 8.368186887 >> FpRoundingBenchmark.test_round_double | 2048 | 270.137 | 830.986 | >> 3.076165057 | 273.159 | 2459.116 | 9.002507697 >> FpRoundingBenchmark.test_round_float | 1024 | 752.436 | 7780.905 | >> 10.34095259 | 752.49 | 9506.694 | 12.63364829 >> FpRoundingBenchmark.test_round_float | 2048 | 389.499 | 4113.046 | >> 10.55983712 | 389.63 | 4863.673 | 12.48279907 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Adding a test for scalar intrinsification. There are already `RoundFloat`, `RoundDouble`, and `RoundDoubleMode` nodes defined. Though `RoundFloat` and `RoundDouble` are legacy nodes used only on x86-32, `RoundDoubleMode` supports multiple rounding modes and is amenable to auto-vectorization. What do you think about the following alternative? Reuse `RoundDoubleMode` (with a new rounding mode) and introduce `RoundFloatMode`. Special rounding rules is not the only peculiarity of `Math.round()`. It also converts the result to an integral type. It can be represented as `ConvF2I (RoundFloatMode f #rmode)` / `ConvD2L (RoundDoubleMode d #rmode)`. In scalar case, it can be matched as a single AD instruction. Auto-vectorizer can then convert it to `VectorCastF2X (RoundFloatModeV vf #rmode)` / `VectorCastD2X (RoundDoubleModeV vd #rmode)` and match it in a similar manner. test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java line 33: > 31: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() > -Xmx128m -XX:MaxVectorSize=16 compiler.c2.cr6340864.TestFloatVect > 32: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() > -Xmx128m -XX:MaxVectorSize=32 compiler.c2.cr6340864.TestFloatVect > 33: * @run main/othervm -Xbatch -XX:CompileCommand=exclude,*::test() > -XX:TieredStopAtLevel=2 -Xmx128m -XX:MaxVectorSize=32 > compiler.c2.cr6340864.TestFloatVect What's the purpose of `-XX:TieredStopAtLevel=2` from testing perspective? ------------- PR: https://git.openjdk.java.net/jdk/pull/7094