On Mon, 11 Apr 2022 09:04:36 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> The optimization for masked store is recorded to: >> https://bugs.openjdk.java.net/browse/JDK-8284050 > >> The blend should be with the intended-to-store vector, so that masked lanes >> contain the need-to-store elements and unmasked lanes contain the loaded >> elements, which would be stored back, which results in unchanged values. > > It may not work if memory is beyond legal accessible address space of the > process, a corner case could be a page boundary. Thus re-composing the > intermediated vector which partially contains actual updates but effectively > perform full vector write to destination address may not work in all > scenarios. Thanks for the comment! So how about adding the check for the valid array range like the masked vector load? Codes like: public final void intoArray(byte[] a, int offset, VectorMask<Byte> m) { if (m.allTrue()) { intoArray(a, offset); } else { ByteSpecies vsp = vspecies(); if (offset >= 0 && offset <= (a.length - vsp.length())) { // a full range check intoArray0(a, offset, m, /* usePred */ false); // can be vectorized by load+blend_store } else { checkMaskFromIndexSize(offset, vsp, m, 1, a.length); intoArray0(a, offset, m, /* usePred */ true); // only be vectorized by the predicated store } } } ------------- PR: https://git.openjdk.java.net/jdk/pull/8035