On Wed, 11 May 2022 19:45:55 GMT, Paul Sandoz <psan...@openjdk.org> wrote:
> I tried your test code with the patch and logged compilation > (`-XX:-TieredCompilation -XX:+PrintCompilation -XX:+PrintInlining > -XX:+PrintIntrinsics -Xbatch`) > > For `func` the first call to `VectorSupport::loadMasked` is intrinsic and > inlined: > > ``` > @ 45 jdk.internal.vm.vector.VectorSupport::loadMasked (40 bytes) > (intrinsic) > ``` > > But the second call (for the last loop iteration) fails to inline: > > ``` > @ 45 jdk.internal.vm.vector.VectorSupport::loadMasked (40 bytes) > failed to inline (intrinsic) > ``` > > Since i am running on an mac book this looks right and aligns with the > `-XX:+PrintIntrinsics` output: > > ``` > ** Rejected vector op (LoadVectorMasked,int,8) because architecture does > not support it > ** Rejected vector op (LoadVectorMasked,int,8) because architecture does > not support it > ** not supported: op=loadMasked vlen=8 etype=int using_byte_array=0 > ``` > > ? > > I have not looked at the code gen nor measured performance comparing the case > when never out of bounds and only out of bounds for the last loop iteration. Yeah, it looks right from the log. Did you try to find whether there is the log `** missing constant: offsetInRange=Parm` with `XX:+PrintIntrinsics` ? Or insert an assertion in `vectorIntrinsics.cpp` like: --- a/src/hotspot/share/opto/vectorIntrinsics.cpp +++ b/src/hotspot/share/opto/vectorIntrinsics.cpp @@ -1236,6 +1236,7 @@ bool LibraryCallKit::inline_vector_mem_masked_operation(bool is_store) { } else { // Masked vector load with IOOBE always uses the predicated load. const TypeInt* offset_in_range = gvn().type(argument(8))->isa_int(); + assert(offset_in_range->is_con(), "must be a constant"); if (!offset_in_range->is_con()) { if (C->print_intrinsics()) { tty->print_cr(" ** missing constant: offsetInRange=%s", And run the tests with debug mode. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/8035