Issue |
128716
|
Summary |
AMDGPU should try to shrink 64-bit defs to 32-bit when rematerializing
|
Labels |
backend:AMDGPU,
llvm:regalloc,
missed-optimization
|
Assignees |
|
Reporter |
arsenm
|
If we are rematerializing a wide instruction, we should try harder to rewrite it to set the minimal set of required lanes at the use point. In the most basic case, this means folding a use of s_mov_b64:
```
%0:sreg_64 = S_MOV_B64 0
// Should rematerialize here to undef %0.sub0 = S_MOV_B32 0
S_NOP 0, implicit %0.sub0
```
[0001-WIP-AMDGPU-Fold-64-bit-moves-into-32-bit-when-materi.patch](https://github.com/user-attachments/files/18965526/0001-WIP-AMDGPU-Fold-64-bit-moves-into-32-bit-when-materi.patch)
Attaching WIP patch to start investigation. I'm not sure the starting point is useful, we try something similar already for scalar loads but I don't think the reMaterialize hook has enough context to see the uses here.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs