| Issue |
174688
|
| Summary |
vdotq_lane_s32 produces vdup + vsdot
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
fbarchard
|
clang with arm backend produces a vdup instead of using lane
when using `vget_high_s8`
```
vacc0x0123 = vdotq_lane_s32(vacc0x0123, vb0123x89AB, vget_high_s8(va_0x16), 0);
```
expected
```
vsdot.s8 q8, q9, d1[0]
```
actual output
```
vdup.32 q10, d1[0]
vsdot.s8 q8, q9, q10
```
inline works
```
asm volatile(" vsdot.s8 q4, q2, d15[0] \n" : : : "cc", "memory");
```
The impact is additional register pressure, and on cortex a55 (in-order execution) a 4 cycle stall for the vdup to complete.
Here is a godbolt link to reproduce the issue:
https://godbolt.org/z/G7Y67sz84
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs