On Wed, 25 Feb 2026 11:23:45 GMT, Jatin Bhateja <[email protected]> wrote:
>> Hi all, >> >> This patch optimizes SIMD kernels making heavy use of broadcasted inputs >> through following reassociating ideal transformations. >> >> >> VectorOperation (VectorBroadcast INP1, VectorBroadcast INP2) => >> VectorBroadcast (ScalarOpration INP1, INP2) >> >> VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast >> INP2) INP3) => >> VectorOperation INP3 (VectorOperation >> (VectorBroadcast INP1) (VectorBroadcast INP2)) >> >> >> The idea is to push broadcasts across the vector operation and replace the >> vector with an equivalent, cheaper scalar variant. Currently, patch handles >> most common vector operations. >> >> Following are the performance number of benchmark included with this patch >> on latest generation x86 targets:- >> >> **AMD Turin (2.1GHz)** >> <img width="1122" height="355" alt="image" >> src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54" >> /> >> >> **Intel Granite Rapids (2.1GHz)** >> <img width="1105" height="325" alt="image" >> src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63" >> /> >> >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Review comments resolutions Sorry for the late review. I've just completed some work that involved reassociating IR nodes, so it's only now that I have a good enough understanding to be able to comment here. src/hotspot/share/opto/vectornode.cpp line 1204: > 1202: } > 1203: > 1204: Node* VectorNode::make_scalar(Compile* c, int sopc, Node* control, > Node* in1, Node* in2, Node* in3) { No specific request here. Just pointing out that I had a similar need recently. While working on an experiment that made https://github.com/openjdk/jdk/pull/30070 available to subclasses of `AddNode`, I found myself writing a method that was quite similar to this: static AddNode* build_min_max(int opcode, Node* a, Node* b, PhaseIdealLoop* phase) { switch (opcode) { case Op_AddI: return new AddINode(a, b); case Op_AddL: return new AddLNode(a, b); case Op_MinD: return new MinDNode(a, b); case Op_MaxD: return new MaxDNode(a, b); case Op_MinF: return new MinFNode(a, b); case Op_MaxF: return new MaxFNode(a, b); // case Op_MinHF: // return new MinHFNode(a, b); // case Op_MaxHF: // return new MaxHFNode(a, b); case Op_MinI: return new MinINode(a, b); case Op_MaxI: return new MaxINode(a, b); case Op_MinL: return new MinLNode(phase->C, a, b); case Op_MaxL: return new MaxLNode(phase->C, a, b); case Op_OrI: return new OrINode(a, b); case Op_OrL: return new OrLNode(a, b); case Op_XorI: return new XorINode(a, b); case Op_XorL: return new XorLNode(a, b); default: ShouldNotReachHere(); } } I ended up not needing it because I limited the code to long Min/Max, but seems like there are situations where given an opcode and varying number of inputs, we want to create new nodes out of that. test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java line 1087: > 1085: > 1086: /* ======================= > 1087: * Reassociation The IR tests for reassociation seem to focus on ints. Should other types be covered? test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java line 1104: > 1102: IRNode.REPLICATE_I, IRNode.VECTOR_SIZE_ANY, ">= 1" }) > 1103: @Warmup(value = 10000) > 1104: static void test_reassociation1() { Also, more descriptive names than `test_reassociation` + number would be appreciated. test/micro/org/openjdk/bench/jdk/incubator/vector/ReassociateVectorBenchmark.java line 89: > 87: > 88: @Benchmark > 89: public void reassociateVectorsKernel1() { I think the benchmark names should be a bit more descriptive, because given `reassociateVectorsKernel1` and `reassociateVectorsKernel2` it's not clear what the difference both of those is. At a glance it looks like `reassociateVectorsKernel1` is for ints and `reassociateVectorsKernel2` for longs? Should other types beyond int and long be covered here too? ------------- PR Review: https://git.openjdk.org/jdk/pull/25617#pullrequestreview-3895469383 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889083244 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889144713 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889147924 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2889138998
