Issue 56089
Summary [AArch64][SVE] Suboptimal code-gen for saba etc
Labels new issue
Assignees
Reporter stevesuzuki-arm
    https://godbolt.org/z/v5v3bd5jj
```
define <vscale x 16 x i8> @saba_nxv8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {
  %1 = call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
  %2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sabd.nxv16i8(<vscale x 16 x i1> %1, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c)
  %3 = add <vscale x 16 x i8> %2, %a
  ret <vscale x 16 x i8> %3
}
```
More instructions are generated with SVE2 than with Neon which `saba` is used instead of `sabd` + `add`.

The same goes to other patterns such as:

1. `saba`,`uaba`
1. `srsra`, `ursra`
2. `ssra`, `usra`


Option : `-mattr=+sve2 -O3`
```
saba_nxv8:                              // @saba_nxv8
        ptrue   p0.b
        sabd    z1.b, p0/m, z1.b, z2.b
        add     z0.b, z1.b, z0.b
        ret
saba_v8:                                // @saba_v8
        saba    v0.16b, v1.16b, v2.16b
        ret
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to