On Wed, 25 Feb 2026 11:23:45 GMT, Jatin Bhateja <[email protected]> wrote:

>> Hi all,
>> 
>> This patch optimizes SIMD kernels making heavy use of broadcasted inputs 
>> through following reassociating ideal transformations.
>> 
>> 
>>  VectorOperation (VectorBroadcast INP1,  VectorBroadcast INP2) => 
>>                             VectorBroadcast (ScalarOpration INP1, INP2)
>> 
>>  VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast 
>> INP2) INP3) => 
>>                              VectorOperation INP3 (VectorOperation 
>> (VectorBroadcast INP1) (VectorBroadcast INP2))
>> 
>> 
>> The idea is to push broadcasts across the vector operation and replace the 
>> vector with an equivalent, cheaper scalar variant.  Currently, patch handles 
>> most common vector operations.
>> 
>> Following are the performance number of benchmark included with this patch 
>> on latest generation x86 targets:- 
>> 
>> **AMD Turin (2.1GHz)**
>> <img width="1122" height="355" alt="image" 
>> src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54";
>>  />
>> 
>> **Intel Granite Rapids (2.1GHz)**
>> <img width="1105" height="325" alt="image" 
>> src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63";
>>  />
>> 
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review comments resolutions

Sorry for the late review. I've just completed some work that involved 
reassociating IR nodes, so it's only now that I have a good enough 
understanding to be able to comment here.

src/hotspot/share/opto/vectornode.cpp line 1204:

> 1202: }
> 1203: 
> 1204: Node* VectorNode::make_scalar(Compile* c, int sopc, Node* control, 
> Node* in1, Node* in2, Node* in3) {

No specific request here. Just pointing out that I had a similar need recently.

While working on an experiment that made 
https://github.com/openjdk/jdk/pull/30070 available to subclasses of `AddNode`, 
I found myself writing a method that was quite similar to this:


static AddNode* build_min_max(int opcode, Node* a, Node* b, PhaseIdealLoop* 
phase) {
  switch (opcode) {
    case Op_AddI:
      return new AddINode(a, b);
    case Op_AddL:
      return new AddLNode(a, b);
    case Op_MinD:
      return new MinDNode(a, b);
    case Op_MaxD:
      return new MaxDNode(a, b);
    case Op_MinF:
      return new MinFNode(a, b);
    case Op_MaxF:
      return new MaxFNode(a, b);
    // case Op_MinHF:
    //   return new MinHFNode(a, b);
    // case Op_MaxHF:
    //   return new MaxHFNode(a, b);
    case Op_MinI:
      return new MinINode(a, b);
    case Op_MaxI:
      return new MaxINode(a, b);
    case Op_MinL:
      return new MinLNode(phase->C, a, b);
    case Op_MaxL:
      return new MaxLNode(phase->C, a, b);
    case Op_OrI:
      return new OrINode(a, b);
    case Op_OrL:
      return new OrLNode(a, b);
    case Op_XorI:
      return new XorINode(a, b);
    case Op_XorL:
      return new XorLNode(a, b);
    default:
      ShouldNotReachHere();
  }
}


I ended up not needing it because I limited the code to long Min/Max, but seems 
like there are situations where given an opcode and varying number of inputs, 
we want to create new nodes out of that.

test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java 
line 1087:

> 1085: 
> 1086:     /* =======================
> 1087:      * Reassociation

The IR tests for reassociation seem to focus on ints. Should other types be 
covered?

test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java 
line 1104:

> 1102:                    IRNode.REPLICATE_I, IRNode.VECTOR_SIZE_ANY, ">= 1" })
> 1103:     @Warmup(value = 10000)
> 1104:     static void test_reassociation1() {

Also, more descriptive names than `test_reassociation` + number would be 
appreciated.

test/micro/org/openjdk/bench/jdk/incubator/vector/ReassociateVectorBenchmark.java
 line 89:

> 87: 
> 88:     @Benchmark
> 89:     public void reassociateVectorsKernel1() {

I think the benchmark names should be a bit more descriptive, because given 
`reassociateVectorsKernel1` and `reassociateVectorsKernel2` it's not clear what 
the difference both of those is. At a glance it looks like 
`reassociateVectorsKernel1` is for ints and `reassociateVectorsKernel2` for 
longs?

Should other types beyond int and long be covered here too?

-------------

PR Review: https://git.openjdk.org/jdk/pull/25617#pullrequestreview-3895469383
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889083244
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889144713
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889147924
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889138998

Reply via email to