| Issue |
169034
|
| Summary |
[AArch64][SVE] Non-temporal load/store instructions fail to be generated from intrinsics and builtins
|
| Labels |
backend:AArch64,
missed-optimization,
llvm:SelectionDAG
|
| Assignees |
|
| Reporter |
ytmukai
|
Since LLVM 21, using ACLE intrinsics for SVE non-temporal loads/stores with an all-true predicate fails to generate the expected non-temporal instructions.
Code to reproduce:
```c
#include <arm_sve.h>
void f(double* a) {
svbool_t allone = svptrue_b64();
svstnt1(allone, a + 1,
svldnt1(allone, a));
}
```
https://godbolt.org/z/za6Mb1Een
In LLVM 21, an all-true predicate is now represented as a constant `splat_vector` in the SelectionDAG. This enables an optimization in `DAGCombiner` that converts a `masked_load`/`masked_store` node into a regular `load`/`store` node. However, the instruction selection patterns for SVE non-temporal instructions are only defined for masked ones.
LLVM 20:
```
Initial selection DAG: %bb.0 'f:entry'
SelectionDAG has 13 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
*** t5: nxv2i1 = llvm.aarch64.sve.ptrue TargetConstant:i64<1553>, TargetConstant:i32<31>
t9: nxv2f64,ch = llvm.aarch64.sve.ldnt1<(non-temporal load (<vscale x 1 x s128>) from %ir.a, align 8, !tbaa !6)> t0, TargetConstant:i64<1481>, t5, t2
t7: i64 = add nuw t2, Constant:i64<8>
t11: ch = llvm.aarch64.sve.stnt1<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !6)> t9:1, TargetConstant:i64<1792>, t9, t5, t7
t12: ch = AArch64ISD::RET_GLUE t11
```
LLVM 21:
```
Initial selection DAG: %bb.0 'f:entry'
SelectionDAG has 15 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t9: nxv2i1 = insert_vector_elt poison:nxv2i1, Constant:i1<-1>, Constant:i64<0>
***t10: nxv2i1 = splat_vector Constant:i1<-1>***
t11: nxv2f64,ch = llvm.aarch64.sve.ldnt1<(non-temporal load (<vscale x 1 x s128>) from %ir.a, align 8, !tbaa !10)> t0, TargetConstant:i64<1598>, t10, t2
t4: i64 = add nuw t2, Constant:i64<8>
t13: ch = llvm.aarch64.sve.stnt1<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, TargetConstant:i64<1909>, t11, t10, t4
t14: ch = AArch64ISD::RET_GLUE t13
Combining: t13: ch = llvm.aarch64.sve.stnt1<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, TargetConstant:i64<1909>, t11, t10, t4
... into: t17: ch = masked_store<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, t15, t4, undef:i64, t10
Combining: t17: ch = masked_store<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, t15, t4, undef:i64, t10
... into: t18: ch = store<(non-temporal store (<vscale x 1 x s128>) into %ir.add.ptr, align 8, !tbaa !10)> t11:1, t15, t4, undef:i64
```
The transformation from `masked_load/store` to `load/store` is performed here:
https://github.com/llvm/llvm-project/blob/622f72f4bef8b177e1e4f318465260fbdb7711ef/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L12782
The existing patterns are defined here:
https://github.com/llvm/llvm-project/blob/622f72f4bef8b177e1e4f318465260fbdb7711ef/llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td#L3052
A similar issue has existed with `__builtin_nontemporal_load/store`. These builtins also fail to generate non-temporal instructions. This appears to be the same root cause.
https://godbolt.org/z/rhzYaxjj5
To resolve both of these issues, should we add isel patterns for non-temporal instructions that match regular `load` and `store` nodes?
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs