Issue |
56088
|
Summary |
[AArch64][SVE] Suboptimal code-gen for fmul(index)
|
Labels |
|
Assignees |
|
Reporter |
stevesuzuki-arm
|
https://godbolt.org/z/6E4Ts6zc7
```
define <vscale x 8 x half> @fmul_index_nxv8(half %a, <vscale x 8 x half> %b) #0 {
%1 = fsub reassoc nnan ninf nsz contract afn half 0xH3C00, %a
%2 = insertelement <vscale x 8 x half> undef, half %1, i64 0
%3 = shufflevector <vscale x 8 x half> %2, <vscale x 8 x half> undef, <vscale x 8 x i32> zeroinitializer
%4 = fmul reassoc nnan ninf nsz contract afn <vscale x 8 x half> %b, %3
ret <vscale x 8 x half> %4
}
```
More instructions are generated with SVE2 than with Neon which fmul(index) is used instead of mov+fmul.
Option : `-mattr=+sve2 -O3`
```
fmul_index_nxv8: // @fmul_index_nxv8
fmov h2, #1.00000000
fsub h0, h2, h0
mov z0.h, h0
fmul z0.h, z1.h, z0.h
ret
fmul_index_v8: // @fmul_index_v8
fmov h2, #1.00000000
fsub h0, h2, h0
fmul v0.8h, v1.8h, v0.h[0]
ret
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs