On Thu, 20 Nov 2025 04:01:09 GMT, Eric Fang <[email protected]> wrote:
>> `VectorMaskCastNode` is used to cast a vector mask from one type to another
>> type. The cast may be generated by calling the vector API `cast` or
>> generated by the compiler. For example, some vector mask operations like
>> `trueCount` require the input mask to be integer types, so for floating
>> point type masks, the compiler will cast the mask to the corresponding
>> integer type mask automatically before doing the mask operation. This kind
>> of cast is very common.
>>
>> If the vector element size is not changed, the `VectorMaskCastNode` don't
>> generate code, otherwise code will be generated to extend or narrow the
>> mask. This IR node is not free no matter it generates code or not because it
>> may block some optimizations. For example:
>> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle
>> `VectorMaskCast` prevented the following optimization: `(VectorStoremask
>> (VectorLoadMask x)) => (x)`
>> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks
>> the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.
>>
>> In these IR patterns, the value of the input `x` is not changed, so we can
>> safely do the optimization. But if the input value is changed, we can't
>> eliminate the cast.
>>
>> The general idea of this PR is introducing an `uncast_mask` helper function,
>> which can be used to uncast a chain of `VectorMaskCastNode`, like the
>> existing `Node::uncast(bool)` function. The funtion returns the first non
>> `VectorMaskCastNode`.
>>
>> The intended use case is when the IR pattern to be optimized may contain one
>> or more consecutive `VectorMaskCastNode` and this does not affect the
>> correctness of the optimization. Then this function can be called to
>> eliminate the `VectorMaskCastNode` chain.
>>
>> Current optimizations related to `VectorMaskCastNode` include:
>> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
>> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) =>
>> (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242.
>>
>> This PR does the following optimizations:
>> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) =>
>> (x)` as `(VectorMaskCast (VectorMaskCast ... (VectorMaskCast x))) => (x)`.
>> Because as long as types of the head and tail `VectorMaskCastNode` are
>> consistent, the optimization is correct.
>> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ...
>> (VectorLoadMask x))) => (x)`. Since the value before and after the pattern
>> is a boolean vect...
>
> Eric Fang has updated the pull request with a new target base due to a merge
> or a rebase. The incremental webrev excludes the unrelated changes brought in
> by the merge/rebase. The pull request contains three additional commits since
> the last revision:
>
> - Don't read and write the same memory in the JMH benchmarks
> - Merge branch 'master' into JDK-8370863-mask-cast-opt
> - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns
>
> `VectorMaskCastNode` is used to cast a vector mask from one type to
> another type. The cast may be generated by calling the vector API `cast`
> or generated by the compiler. For example, some vector mask operations
> like `trueCount` require the input mask to be integer types, so for
> floating point type masks, the compiler will cast the mask to the
> corresponding integer type mask automatically before doing the mask
> operation. This kind of cast is very common.
>
> If the vector element size is not changed, the `VectorMaskCastNode`
> don't generate code, otherwise code will be generated to extend or narrow
> the mask. This IR node is not free no matter it generates code or not
> because it may block some optimizations. For example:
> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))`
> The middle `VectorMaskCast` prevented the following optimization:
> `(VectorStoremask (VectorLoadMask x)) => (x)`
> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which
> blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.
>
> In these IR patterns, the value of the input `x` is not changed, so we
> can safely do the optimization. But if the input value is changed, we
> can't eliminate the cast.
>
> The general idea of this PR is introducing an `uncast_mask` helper
> function, which can be used to uncast a chain of `VectorMaskCastNode`,
> like the existing `Node::uncast(bool)` function. The funtion returns
> the first non `VectorMaskCastNode`.
>
> The intended use case is when the IR pattern to be optimized may
> contain one or more consecutive `VectorMaskCastNode` and this does not
> affect the correctness of the optimization. Then this function can be
> called to eliminate the `VectorMaskCastNode` chain.
>
> Current optimizations related to `VectorMaskCastNode` include:
> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (...
Nice improvement @erifan, just some small comments from me
src/hotspot/share/opto/vectornode.cpp line 1056:
> 1054: // x remains to be a bool vector with no changes.
> 1055: // This function can be used to eliminate the VectorMaskCast in such
> patterns.
> 1056: Node* VectorNode::uncast_mask(Node* n) {
Could this be a static method instead?
test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 57:
> 55: applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"})
> 56: public static int testTwoCastToDifferentType() {
> 57: // The types before and after the two casts are not the same, so
> the cast cannot be eliminated.
Outdated comment. Also please expand assertion comments
test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 79:
> 77: applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"})
> 78: public static int testTwoCastToDifferentType2() {
> 79: // The types before and after the two casts are not the same, so
> the cast cannot be eliminated.
Could you expand the documentation on the IR assertions? It's not immediately
clear why with AVX-512 the cast remains but with AVX-2 it's removed. Also, this
comment is outdated.
test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 240:
> 238:
> 239: @Test
> 240: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",
Could you add some assertion comments here as well to understand what causes
the differences with different architectures?
test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 260:
> 258:
> 259: @Test
> 260: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",
Same here
-------------
Changes requested by galder (Author).
PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3518051437
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570915091
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570925650
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570924373
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570932229
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570932750