Issue |
56089
|
Summary |
[AArch64][SVE] Suboptimal code-gen for saba etc
|
Labels |
new issue
|
Assignees |
|
Reporter |
stevesuzuki-arm
|
https://godbolt.org/z/v5v3bd5jj
```
define <vscale x 16 x i8> @saba_nxv8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {
%1 = call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sabd.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c)
%3 = add <vscale x 16 x i8> %2, %a
ret <vscale x 16 x i8> %3
}
```
More instructions are generated with SVE2 than with Neon which `saba` is used instead of `sabd` + `add`.
The same goes to other patterns such as:
1. `saba`,`uaba`
1. `srsra`, `ursra`
2. `ssra`, `usra`
Option : `-mattr=+sve2 -O3`
```
saba_nxv8: // @saba_nxv8
ptrue p0.b
sabd z1.b, p0/m, z1.b, z2.b
add z0.b, z1.b, z0.b
ret
saba_v8: // @saba_v8
saba v0.16b, v1.16b, v2.16b
ret
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs