On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark Before After Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask 1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Wrap the offset check into a static method Thanks for the explanation! Yeah, the main problem is Java doesn't have the direct unsigned comparison. We need the function call. From the two ways you provided, I think the second `Integer.lessThanUnsigned` looks better. But I'm not sure whether this could improve the performance a lot, although the first check `a.length - vsp.length() > 0` can be hosited out side of the loop. And this might make the codes more complex for me. Maybe we can do a pre research to find a better implementation to the unsigned comparison first. Do you think so? ------------- PR: https://git.openjdk.java.net/jdk/pull/8620