| Issue |
176406
|
| Summary |
[AMDGPU] Make better use of v_perm_b32 for transpose-like operations on multiple dwords
|
| Labels |
enhancement,
backend:AMDGPU
|
| Assignees |
jayfoad
|
| Reporter |
jayfoad
|
[ideal_perm.txt](https://github.com/user-attachments/files/24675043/ideal_perm.txt) shows desired codegen for a transpose-like operation on 16 bytes packed into 4 dwords.
[perm.txt](https://github.com/user-attachments/files/24675044/perm.txt) and [good_shuffle_perm.txt](https://github.com/user-attachments/files/24675045/good_shuffle_perm.txt) are attempts to represent this in IR, first with extractelement/insertelement and then with shufflevector. In both cases the compiler fails to make use of v_perm_b32 and instead generates a longer sequence of shifts and ORs with SDWA e.g.:
```
$ llc -mtriple=amdgcn -mcpu=gfx900 perm.ll -o -
...
v_lshlrev_b16_e32 v16, 8, v16
v_or_b32_sdwa v8, v8, v16 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_lshlrev_b16_e32 v16, 8, v20
v_or_b32_sdwa v12, v12, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_or_b32_sdwa v8, v8, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
v_lshlrev_b16_e32 v12, 8, v18
v_or_b32_sdwa v10, v10, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_lshlrev_b16_e32 v12, 8, v22
v_or_b32_sdwa v12, v14, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_or_b32_sdwa v10, v10, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
v_lshlrev_b16_e32 v12, 8, v17
v_or_b32_sdwa v9, v9, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_lshlrev_b16_e32 v12, 8, v21
v_or_b32_sdwa v12, v13, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_or_b32_sdwa v9, v9, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
v_lshlrev_b16_e32 v12, 8, v19
v_or_b32_sdwa v11, v11, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_lshlrev_b16_e32 v12, 8, v23
v_or_b32_sdwa v12, v15, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
v_or_b32_sdwa v11, v11, v12 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
...
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs