Issue 154324
Summary Vector constants in NEON intrinsics are needlessly reloaded
Labels new issue
Assignees
Reporter rdoeffinger
    When a constant in an intrinsic is loaded in a conditional branch, it gets re-loaded over and over even when the load is unavoidable and there are plenty of registers.
(Almost) minimal reproducer that can be checked with e.g. godbolt.org :
```
#include <arm_neon.h>

uint8x16_t test(unsigned a, unsigned b, unsigned c, unsigned d)
{
    uint32x4_t fact = vcombine_u32(vcreate_u32(0x123456789012345ull), vcreate_u32(0x67890123456ull));
    uint8x8_t tbl = vcreate_u8(0x0f0b07030f0b0703ull);
    uint8x8_t resa = vdup_n_u8(0);
 if (a) resa = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(a), fact)), tbl);
    uint8x8_t resb = vdup_n_u8(0);
    if (b) resb = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(b), fact)), tbl);
 uint8x8_t resc = vdup_n_u8(0);
    if (c) resc = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(c), fact)), tbl);
 uint8x8_t resd = vqtbl1_u8(vreinterpretq_u8_u32(vmulq_u32(vdupq_n_u32(d), fact)), tbl);
    uint8x8_t resab = vreinterpret_u8_u32(vzip1_u32(vreinterpret_u32_u8(resa), vreinterpret_u32_u8(resb)));
    uint8x8_t rescd = vreinterpret_u8_u32(vzip1_u32(vreinterpret_u32_u8(resc), vreinterpret_u32_u8(resd)));
    return vcombine_u8(resab, rescd);
};
```
fact and tbl get loaded inside each if. MSVC does not have this issue (though gcc does), and scalars also seem to not suffer from this.
(side note I will not create a ticket for for now: constant propagation seems to be unable to handle 0 input to vtbl, resulting in code like:
```
movi.2d v1, #0000000000000000
tbl.8b v0, { v1 }, v0
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to