On Tue, 9 Jun 2026 13:19:15 GMT, Emanuel Peter <[email protected]> wrote:
>> Eric Fang has updated the pull request with a new target base due to a merge
>> or a rebase. The incremental webrev excludes the unrelated changes brought
>> in by the merge/rebase. The pull request contains five additional commits
>> since the last revision:
>>
>> - Fine-tuning the code
>> - Merge branch 'master' into JDK-8382052-bitwise-blend
>> - Implement bitwise_blend in IGVN
>>
>> The latest changes:
>>
>> 1. Defined a new IR `VectorBitwiseBlendNode`
>> 2. Do the optimization in IGVN:
>> // XorV(a, AndV(sel, XorV(a, b))) => VectorBitwiseBlend(a, b, sel)
>> // XorV(a, AndV(sel, XorV(a, b)), mask) =>
>> // VectorBlend(a, VectorBitwiseBlend(a, b, sel), mask)
>>
>> 3. Adjust the ad file match rules to match `VectorBitwiseBlendNode`.
>> 4. Adjust the JTReg tests to check `VectorBitwiseBlendNode`.
>> - Merge branch 'master' into JDK-8382052-bitwise-blend
>> - 8382052: VectorAPI: AArch64: Optimize the lanewise BITWISE_BLEND
>> operation with BSL
>>
>> Vector API `lanewise BITWISE_BLEND` on AArch64 is currently lowered to a
>> generic vector sequence built from `(XorV(AndV(XorV)))` nodes. AArch64
>> provides a more efficient mapping for this operation through the NEON
>> `BSL` and SVE `BSL` (bitwise select) instructions.
>>
>> This change teaches C2 to recognize the `BITWISE_BLEND` patterns and
>> lower them to the dedicated AArch64 instructions for better performance.
>>
>> The change includes the AArch64 match rules and assembler support,
>> updates the AArch64 asm tests, adds IR framework nodes for the new mach
>> instructions, introduces a new jtreg IR test and extends the
>> MaskedLogicOpts JMH benchmark for 128-bit long type.
>>
>> JMH results show **11% - 54%** performance improvements for the
>> optimized cases, and all jtreg tests (tier1, tier2 and tier3) passe on
>> SVE2, SVE1, and NEON configurations.
>>
>> On a Nvidia Grace (Neoverse-V2) machine with 128-bit SVE2:
>> ```
>> Benchmark Unit ARRAYLEN Before Error After
>> Error Uplift
>> bitwiseBlendOperationInt128 ops/s 256.00 3787.49 5.29
>> 4277.64 8.89 1.13
>> bitwiseBlendOperationInt128 ops/s 512.00 1888.24 11.02
>> 2143.21 6.32 1.14
>> bitwiseBlendOperationInt128 ops/s 1024.00 938.22 6.24
>> 1053.45 14.68 1.12
>> bitwiseBlendOperationLong128 ops/s 256.00 1895.45 13.68
>> 2140.31 3.68 1.13
>> bitwiseBlendOperationLong128 ops/s 512.00 938.71 5.32
>> 1052.16 14.07 1.12
>> bitwi...
>
> src/hotspot/share/opto/vectornode.cpp line 2798:
>
>> 2796: } else if (in(2)->Opcode() == Op_AndV) {
>> 2797: andv = in(2);
>> 2798: a = in(1);
>
> This could be simplified to an or with the same body, no?
> At least: flip the two lines in the if-branch, in all others you assign
> `andv` first.
Make sense, done, thanks!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/31269#discussion_r3385514963