On Mon, 11 Apr 2022 09:04:36 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> The optimization for masked store is recorded to: 
>> https://bugs.openjdk.java.net/browse/JDK-8284050
>
>> The blend should be with the intended-to-store vector, so that masked lanes 
>> contain the need-to-store elements and unmasked lanes contain the loaded 
>> elements, which would be stored back, which results in unchanged values.
> 
> It may not work if memory is beyond legal accessible address space of the 
> process, a corner case could be a page boundary.  Thus re-composing the 
> intermediated vector which partially contains actual updates but effectively 
> perform full vector write to destination address may not work in all 
> scenarios.

Thanks for the comment! So how about adding the check for the valid array range 
like the masked vector load?
Codes like:

public final
    void intoArray(byte[] a, int offset,
                   VectorMask<Byte> m) {
        if (m.allTrue()) {
            intoArray(a, offset);
        } else {
            ByteSpecies vsp = vspecies();
            if (offset >= 0 && offset <= (a.length - vsp.length())) {     // a 
full range check
                intoArray0(a, offset, m, /* usePred */ false);                  
 // can be vectorized by load+blend_store
            } else {
                checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
                intoArray0(a, offset, m, /* usePred */ true);                   
 // only be vectorized by the predicated store
            }
        }
    }

-------------

PR: https://git.openjdk.java.net/jdk/pull/8035

Reply via email to