On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong <[email protected]> wrote:
>> Currently the vector load with mask when the given index happens out of the
>> array boundary is implemented with pure java scalar code to avoid the IOOBE
>> (IndexOutOfBoundaryException). This is necessary for architectures that do
>> not support the predicate feature. Because the masked load is implemented
>> with a full vector load and a vector blend applied on it. And a full vector
>> load will definitely cause the IOOBE which is not valid. However, for
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it
>> can be vectorized with the predicated load instruction as long as the
>> indexes of the masked lanes are within the bounds of the array. For these
>> architectures, loading with unmasked lanes does not raise exception.
>>
>> This patch adds the vectorization support for the masked load with IOOBE
>> part. Please see the original java implementation (FIXME: optimize):
>>
>>
>> @ForceInline
>> public static
>> ByteVector fromArray(VectorSpecies<Byte> species,
>> byte[] a, int offset,
>> VectorMask<Byte> m) {
>> ByteSpecies vsp = (ByteSpecies) species;
>> if (offset >= 0 && offset <= (a.length - species.length())) {
>> return vsp.dummyVector().fromArray0(a, offset, m);
>> }
>>
>> // FIXME: optimize
>> checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>> return vsp.vOp(m, i -> a[offset + i]);
>> }
>>
>> Since it can only be vectorized with the predicate load, the hotspot must
>> check whether the current backend supports it and falls back to the java
>> scalar version if not. This is different from the normal masked vector load
>> that the compiler will generate a full vector load and a vector blend if the
>> predicate load is not supported. So to let the compiler make the expected
>> action, an additional flag (i.e. `usePred`) is added to the existing
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while
>> "false" for the normal load. And the compiler will fail to intrinsify if the
>> flag is "true" and the predicate load is not supported by the backend, which
>> means that normal java path will be executed.
>>
>> Also adds the same vectorization support for masked:
>> - fromByteArray/fromByteBuffer
>> - fromBooleanArray
>> - fromCharArray
>>
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x`
>> on the x86 AVX-512 system:
>>
>> Benchmark before After Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE 233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms
>>
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Rename the "usePred" to "offsetInRange"
Rest of the patch looks good to me.
src/hotspot/share/opto/vectorIntrinsics.cpp line 1232:
> 1230: // out when current case uses the predicate feature.
> 1231: if (!supports_predicate) {
> 1232: bool use_predicate = false;
If we rename this to needs_predicate it will be easier to understand.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8035