On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong <[email protected]> wrote:
> Checking whether the indexes of masked lanes are inside of the valid memory
> boundary is necessary for masked vector memory access. However, this could be
> saved if the given offset is inside of the vector range that could make sure
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have
> saved this kind of check for common cases. And this patch did the similar
> optimization for the masked vector store.
>
> The performance for the new added store masked benchmarks improves about
> `1.83x ~ 2.62x` on a x86 system:
>
> Benchmark Before After Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask 1993.986 4874.148 2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms
>
> Similar performane gain can also be observed on ARM hardware.
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
line 4086:
> 4084: } else {
> 4085: $Type$Species vsp = vspecies();
> 4086: if (offset < 0 || offset > (a.length - vsp.length())) {
Can we use `VectorIntrinsics.checkFromIndexSize`? e.g.
if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { ...
-------------
PR: https://git.openjdk.java.net/jdk/pull/8620