Issue |
154324
|
Summary |
Vector constants in NEON intrinsics are needlessly reloaded
|
Labels |
new issue
|
Assignees |
|
Reporter |
rdoeffinger
|
When a constant in an intrinsic is loaded in a conditional branch, it gets re-loaded over and over even when the load is unavoidable and there are plenty of registers.
(Almost) minimal reproducer that can be checked with e.g. godbolt.org :
```
#include <arm_neon.h>
uint8x16_t test(unsigned a, unsigned b, unsigned c, unsigned d)
{
uint32x4_t fact = vcombine_u32(vcreate_u32(0x123456789012345ull), vcreate_u32(0x67890123456ull));
uint8x8_t tbl = vcreate_u8(0x0f0b07030f0b0703ull);
uint8x8_t resa = vdup_n_u8(0);
if (a) resa = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(a), fact)), tbl);
uint8x8_t resb = vdup_n_u8(0);
if (b) resb = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(b), fact)), tbl);
uint8x8_t resc = vdup_n_u8(0);
if (c) resc = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(c), fact)), tbl);
uint8x8_t resd = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(d), fact)), tbl);
uint8x8_t resab = vreinterpret_u8_u32(vzip1_u32(vreinterpret_u32_u8(resa), vreinterpret_u32_u8(resb)));
uint8x8_t rescd = vreinterpret_u8_u32(vzip1_u32(vreinterpret_u32_u8(resc), vreinterpret_u32_u8(resd)));
return vcombine_u8(resab, rescd);
};
```
fact and tbl get loaded inside each if. MSVC does not have this issue (though gcc does), and scalars also seem to not suffer from this.
(side note I will not create a ticket for for now: constant propagation seems to be unable to handle 0 input to vtbl, resulting in code like:
```
movi.2d v1, #0000000000000000
tbl.8b v0, { v1 }, v0
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs