On Sat, 12 Mar 2022 19:58:37 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test creation using new IR testing framework. >> >> Following are the performance number of a JMH micro included with the patch >> >> Test System: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake Server) >> >> >> Benchmark | TESTSIZE | Baseline AVX3 (ops/ms) | Withopt AVX3 (ops/ms) | Gain >> ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio >> -- | -- | -- | -- | -- | -- | -- | -- >> FpRoundingBenchmark.test_round_double | 1024.00 | 504.15 | 2209.54 | 4.38 | >> 510.36 | 548.39 | 1.07 >> FpRoundingBenchmark.test_round_double | 2048.00 | 293.64 | 1271.98 | 4.33 | >> 293.48 | 274.01 | 0.93 >> FpRoundingBenchmark.test_round_float | 1024.00 | 825.99 | 4754.66 | 5.76 | >> 751.83 | 2274.13 | 3.02 >> FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | >> 388.52 | 1334.18 | 3.43 >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > 8279508: Creating separate test for round double under feature check. src/hotspot/cpu/x86/assembler_x86.hpp line 1159: > 1157: void cvttsd2siq(Register dst, Address src); > 1158: void cvttsd2siq(Register dst, XMMRegister src); > 1159: void cvtsd2siq(Register dst, XMMRegister src); Hi, some small suggestions only, the instructions are sorted alphabetically so the `cvtsd2si` should come before `scttsd2si`, the same for the below instructions. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4024: > 4022: * the result is equal to the value of Integer.MAX_VALUE. > 4023: */ > 4024: void C2_MacroAssembler::vector_cast_float_special_cases_avx(XMMRegister > dst, XMMRegister src, XMMRegister xtmp1, This special handling is really large, could we use a stub routine for it? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4161: > 4159: movl(scratch, 1056964608); > 4160: movq(xtmp1, scratch); > 4161: vbroadcastss(xtmp1, xtmp1, vec_enc); An `evpbroadcastd` would reduce this by one instruction I guess? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4178: > 4176: movl(scratch, 1056964608); > 4177: movq(xtmp1, scratch); > 4178: vbroadcastss(xtmp1, xtmp1, vec_enc); You could put the constant in the constant table and use `vbroadcastss` here also. Thank you very much. src/hotspot/cpu/x86/x86.ad line 7297: > 7295: ins_encode %{ > 7296: int vlen_enc = vector_length_encoding(this); > 7297: InternalAddress new_mxcsr = $constantaddress(0x3F80L); `ldmxcsr` takes a `m32` argument so this constant can be an `int` instead. Also, I would suggest putting the `mxcst_std` in the constant table also. src/hotspot/cpu/x86/x86_64.ad line 10699: > 10697: match(Set dst (ConvD2L src)); > 10698: effect(KILL cr); > 10699: format %{ "round_or_convert_d2l $dst,$src"%} You could revert the changes for `ConvD2L` and `ConvF2I` here ------------- PR: https://git.openjdk.java.net/jdk/pull/7094