Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v6]

Vladimir Ivanov Wed, 11 Mar 2026 20:16:44 -0700

On Tue, 10 Mar 2026 06:37:29 GMT, Jatin Bhateja <[email protected]> wrote:


>> Hi all,
>> 
>> This patch optimizes SIMD kernels making heavy use of broadcasted inputs 
>> through following reassociating ideal transformations.
>> 
>> 
>>  VectorOperation (VectorBroadcast INP1,  VectorBroadcast INP2) => 
>>                             VectorBroadcast (ScalarOpration INP1, INP2)
>> 
>>  VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast 
>> INP2) INP3) => 
>>                              VectorOperation INP3 (VectorOperation 
>> (VectorBroadcast INP1) (VectorBroadcast INP2))
>> 
>> 
>> The idea is to push broadcasts across the vector operation and replace the 
>> vector with an equivalent, cheaper scalar variant.  Currently, patch handles 
>> most common vector operations.
>> 
>> Following are the performance number of benchmark included with this patch 
>> on latest generation x86 targets:- 
>> 
>> **AMD Turin (2.1GHz)**
>> <img width="1122" height="355" alt="image" 
>> src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54";
>>  />
>> 
>> **Intel Granite Rapids (2.1GHz)**
>> <img width="1105" height="325" alt="image" 
>> src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63";
>>  />
>> 
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review comments resolution

src/hotspot/share/opto/vectornode.cpp line 1293:

> 1291: // scalar operation.
> 1292: //
> 1293: // VectorOperation (VectorBroadcast INP1) (VectorOperation 
> (VectorBroadcast INP2) INP3) =>

The comment looks confusing: it mentions `VectorBroadcast` while the 
corresponding node is named `ReplicateNode`.

src/hotspot/share/opto/vectornode.hpp line 158:

> 156:   static int opcode(int sopc, BasicType bt);         // scalar_opc -> 
> vector_opc
> 157:   static int scalar_opcode(int vopc, BasicType bt);  // vector_opc -> 
> scalar_opc, 0 if not handled
> 158:   static Node* make_scalar(Compile* c, int sopc, Node* control, Node* 
> in1, Node* in2, Node* in3);

It's a bit weird to see `VectorNode::make_scalar()`. It can be either moved to 
`Node` or accept vector opcode and do vector->scalar opcode conversion 
internally. 

Also, it would be nice to ensure that `VectorNode::opcode()` and 
`VectorNode::scalar_opcode()` agree.  And `VectorNode::make_scalar()` can be 
one place where it is checked (`assert(opcode(scalar_opcode(vopc)) == vopc, 
"%s", NodeClassNames[vopc])`).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2921957722
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2921991340

Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v6]

Reply via email to