On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong <[email protected]> wrote:
>> Checking whether the indexes of masked lanes are inside of the valid memory
>> boundary is necessary for masked vector memory access. However, this could
>> be saved if the given offset is inside of the vector range that could make
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs
>> have saved this kind of check for common cases. And this patch did the
>> similar optimization for the masked vector store.
>>
>> The performance for the new added store masked benchmarks improves about
>> `1.83x ~ 2.62x` on a x86 system:
>>
>> Benchmark Before After Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask 1993.986 4874.148 2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms
>>
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Wrap the offset check into a static method
However, we seem to lack the ability to do an unsigned comparison reliably. C2
can transform `x + MIN_VALUE <=> y + MIN_VALUE` into `x u<=> y` but it will
fail if `x` or `y` is an addition with constant in such cases the constants
will be merged together. As a result, I think we need an intrinsic for this.
`Integer.compareUnsigned` may fit but it manifests the result into an integer
register which may lead to suboptimal materialisation of flags, another
approach would be to have a separate method `Integer.lessThanUnsigned` which
only returns `boolean` and C2 can have better time splitting the boolean
comparison through `IfNode`, which will prevent the materialisation of
`boolean` values. What do you two think?
I.e, after splitting if through merge point, the shape of `if
(Integer.lessThanUnsigned(a, b))` would be transformed from
a b
\ /
CmpU
|
Bool
|
If
/ \
IfTrue IfFalse
\ /
Region 1 0
\ | /
Phi 0
\ /
CmpI
into
a b
\ /
CmpU
Thanks.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8620