[llvm-branch-commits] [llvm] [AMDGPU] Support one immediate folding for global load (PR #178608)

Matt Arsenault via llvm-branch-commits Fri, 30 Jan 2026 05:01:28 -0800

================
@@ -2037,13 +2037,36 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, 
SDValue Addr,
     LHS = Addr.getOperand(0);
 
     if (!LHS->isDivergent()) {
-      // add (i64 sgpr), (*_extend (i32 vgpr))
       RHS = Addr.getOperand(1);
-      ScaleOffset = SelectScaleOffset(N, RHS, Subtarget->hasSignedGVSOffset());
+
       if (SDValue ExtRHS = matchExtFromI32orI32(
               RHS, Subtarget->hasSignedGVSOffset(), CurDAG)) {
+        // add (i64 sgpr), (*_extend (scale (i32 vgpr)))
         SAddr = LHS;
         VOffset = ExtRHS;
+        if (NeedIOffset && !ImmOffset &&
+            CurDAG->isBaseWithConstantOffset(ExtRHS)) {
+          // add (i64 sgpr), (*_extend (add (scale (i32 vgpr)), (i32 imm)))
----------------
arsenm wrote:


alive2 will be correct, assuming you wrote the IR that actually matches the 
hardware addressing mode. The tricky part is being sure that it matches what 
the hardware actually does. Your proof matches my understanding of what the 
hardware does. So yes, the overflow on 32-bit is a problem and this addressing 
calculation should be reassociated earlier 

https://github.com/llvm/llvm-project/pull/178608
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Support one immediate folding for global load (PR #178608)

Reply via email to