[llvm-branch-commits] [llvm] [AMDGPU] Support one immediate folding for global load (PR #178608)

via llvm-branch-commits Thu, 29 Jan 2026 19:03:17 -0800

================
@@ -2037,13 +2037,36 @@ bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N, 
SDValue Addr,
     LHS = Addr.getOperand(0);
 
     if (!LHS->isDivergent()) {
-      // add (i64 sgpr), (*_extend (i32 vgpr))
       RHS = Addr.getOperand(1);
-      ScaleOffset = SelectScaleOffset(N, RHS, Subtarget->hasSignedGVSOffset());
+
       if (SDValue ExtRHS = matchExtFromI32orI32(
               RHS, Subtarget->hasSignedGVSOffset(), CurDAG)) {
+        // add (i64 sgpr), (*_extend (scale (i32 vgpr)))
         SAddr = LHS;
         VOffset = ExtRHS;
+        if (NeedIOffset && !ImmOffset &&
+            CurDAG->isBaseWithConstantOffset(ExtRHS)) {
+          // add (i64 sgpr), (*_extend (add (scale (i32 vgpr)), (i32 imm)))
----------------
ruiling wrote:


Thanks! After looking at the description of the hardware doc. The `VOffset` was 
treated as unsigned offset if `saddr` is not null. It make sense this only 
apply to zext. But even with zext. alive2 still complain the transformation 
does not verify (see https://alive2.llvm.org/ce/z/3tuqJa). The major issue is 
overflow might happen for 32bit addition, but for 64bit addition, it does not. 
I am not sure the target IR in alive2 test correctly model the hardware 
behavior.

https://github.com/llvm/llvm-project/pull/178608
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Support one immediate folding for global load (PR #178608)

Reply via email to