| Issue |
179013
|
| Summary |
[AArch64] vextq_u8 expands into two EXT instructions in some cases
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
zeux
|
When compiling the attached file with -O2/-O3 for AArch 64 ([test.cpp](https://github.com/user-attachments/files/24977466/test.cpp)), LLVM generates a loop that has 3 EXT instructions; the second EXT in the code is expanded into a two-EXT sequence before the store:
```asm
ext v3.16b, v2.16b, v2.16b, #8
ext v1.8b, v1.8b, v3.8b, #7
str d1, [x0], #8
```
This is new as of LLVM 20; LLVM 19 generated one EXT instead:
```asm
ext v2.16b, v1.16b, v1.16b, #7
str d2, [x0], #8
```
I'm not sure to what extent this affects performance on my larger code out of which this repro was extracted; llvm-mca claims that the loop in test.cpp gets 2 cycles slower (4.2 => 6.2). The instruction appears to be entirely redundant.
Replacing `vst1_u8` in the code with `vst1q_lane_u64` (with appropriate casts) seems to work around the issue, although it generates a differently flavored store so I'm not sure if it has other consequences.
Godbolt link for ease of experimentation: https://gcc.godbolt.org/z/4q8n5qbzK
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs