Re: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v12]

Eric Fang Wed, 04 Mar 2026 19:49:54 -0800

On Wed, 4 Mar 2026 10:39:39 GMT, Emanuel Peter <[email protected]> wrote:


>> Hi @eme64 
>> 
>> I tried to address this issue by tweaking the notification, but it turns out 
>> that doesn’t really help. The reason is that `add_users_of_use_to_worklist` 
>> only enqueues users into the IGVN worklist when a node actually changes 
>> during IGVN. For a particular inlining order, `VectorStoreMaskNode` never 
>> gets added to the IGVN worklist. And even if it does, the optimization still 
>> won’t fire if the full IR pattern is not yet in place.
>> 
>> Because of that, I’m leaning toward a more general approach: after 
>> **incremental inlining** and **box/unbox elimination**, the vector nodes are 
>> essentially formed and the node graph is relatively clean. At that point, in 
>> `PhaseVector::do_cleanup()` we can add all vector nodes to the IGVN 
>> worklist, which helps trigger optimizations that might otherwise be missed.
>> 
>> I ran into a very similar issue before when working on the[ “vector compare 
>> not” 
>> optimization](https://github.com/openjdk/jdk/pull/24674/changes#diff-a9c13c0c2dc96fe9239116ac75460fdffbbbebd6d44540ccf7ec5811c8fbf79eR1219),
>>  and related problems are commented in 
>> [PhaseIterGVN::verify_Identity_for](https://github.com/openjdk/jdk/blob/f26b379b97301aca49dda9cb5dfe9a5472af7140/src/hotspot/share/opto/phaseX.cpp#L2043)
>>  . I think this method applies well to this class of issues.
>> 
>> I’ve implemented a prototype locally, and testing shows it fixes the 
>> failures seen in this PR. I also measured compilation times for various 
>> methods in 
>> [test/hotspot/jtreg/compiler/vectorapi/VectorStoreMaskIdentityTest.java](https://github.com/openjdk/jdk/pull/28313/changes#diff-293703ca13dd3f7932313e339faccbab007793d2a97fa4c29739291e134aa467R40)
>>  and they are essentially unchanged. Avg compile time of 12 runs:
>> 
>> <html xmlns:v="urn:schemas-microsoft-com:vml"
>> xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40";>
>> 
>> <head>
>> 
>> <meta name=ProgId content=Excel.Sheet>
>> <meta name=Generator content="Microsoft Excel 15">
>> <link id=Main-File rel=Main-File
>> href="file:////Users/erfang/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm">
>> <link rel=File-List
>> href="file:////Users/erfang/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml">
>> 
>> </head>
>> 
>> <body link="#467886" vlink="#96607D">
>> 
>> 
>> function | c2_compile_time_without_the   change(ms) | 
>> c2_compile_time_with_the_change(ms) | after/before
>> -- | -- | -- |...
>
>> For a particular inlining order, VectorStoreMaskNode never gets added to the 
>> IGVN worklist.
> 
> That sounds like the real problem to me. We should make sure that they are 
> always added, no?
> 
>> At that point, in PhaseVector::do_cleanup() we can add all vector nodes to 
>> the IGVN worklist, which helps trigger optimizations that might otherwise be 
>> missed.
> 
> I'm not yet sure that is the solution, but I'd have to dig a bit deeper 
> myself. Why can't we just add all the nodes to the IGVN worklist when we do 
> the inlining? How do we do this with "regular" inlining of scalar methods? We 
> must put those nodes on the IGVN worklist as well, I'm thinking.

Hi @eme64,

Thanks for the questions. Here's how I see it:

> That sounds like the real problem to me. 

I think there are two parts: First, due to the uncertainty of incremental 
inlining order, when `VectorStoreMaskNode` is created (via 
`vectorIntrinsics.cpp`), some nodes in the pattern may not have been generated 
yet, or `box/unbox` nodes may still be in the way. So when we call 
`gvn().transform()` during inlining, the IR pattern may not be ready yet. In 
this case, `Ideal/Identity` optimizations cannot fire during the 
`gvn().transform()`. Second, as you said, `VectorStoreMaskNode` may never be 
added to the worklist if no subsequent changes cause this.

> We should make sure that they are always added, no?

Not exactly, generally when a new node is generated, we call 
`gvn().transform()`, if the pattern is ready, the optimization will fire 
normally.

> How do we do this with "regular" inlining of scalar methods?

If I understand correctly, scalar and vector nodes are handled the same during 
inlining - both go through `gvn().transform() (PhaseGVN)`, not `IGVN`. By 
default, **neither adds newly created nodes to the IGVN worklist**.

**Why add vector nodes to the IGVN worklist after box/unbox elimination rather 
than when inlining?**

Some vector nodes are generated in **boxed** form in vectorIntrinsics.cpp 
(e.g., "Wrap it up in VectorBox to keep object type information"). Until 
box/unbox elimination runs, the IR pattern for optimizations like 
`(VectorStoreMask (VectorMaskCast* (VectorLoadMask x))) => (x)` **may** not 
match. Only after both `incremental inlining` and `box/unbox elimination` are 
complete does the tree reach a clean, complete state. Also, loop optimization 
hasn't started yet, so the node count isn't too high.

**Why not add the whole tree to the IGVN worklist:**

I'm not opposed to this, and I believe scalar nodes may have the same issue. I 
was just being conservative - currently there are many places that add nodes to 
the IGVN worklist, but they all add only specific nodes that meet certain 
conditions. Perhaps to avoid increasing compile time significantly? Since 
`PhaseVector::do_cleanup()` only runs once, I don't expect this approach to 
noticeably increase compile time. My measurements so far support that.

Does this align with your understanding? I'm happy to explore adding nodes at 
creation time (e.g., when creating vector nodes) if you think that's a better 
direction. Thanks!

By the way, considering this is a general problem, should we handle it 
separately? For this PR, I can make simple modifications to compensate for the 
missed optimization opportunities and then get the tests to pass.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-4001928595

Re: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v12]

Reply via email to