Issue 128716
Summary AMDGPU should try to shrink 64-bit defs to 32-bit when rematerializing
Labels backend:AMDGPU, llvm:regalloc, missed-optimization
Assignees
Reporter arsenm
     If we are rematerializing a wide instruction, we should try harder to rewrite it to set the minimal set of required lanes at the use point. In the most basic case, this means folding a use of s_mov_b64:
```
  %0:sreg_64 = S_MOV_B64 0
   
  // Should rematerialize here to undef %0.sub0 = S_MOV_B32 0
  S_NOP 0, implicit %0.sub0
```

[0001-WIP-AMDGPU-Fold-64-bit-moves-into-32-bit-when-materi.patch](https://github.com/user-attachments/files/18965526/0001-WIP-AMDGPU-Fold-64-bit-moves-into-32-bit-when-materi.patch)


Attaching WIP patch to start investigation. I'm not sure the starting point is useful, we try something similar already for scalar loads but I don't think the reMaterialize hook has enough context to see the uses here.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to