https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123260
Bug ID: 123260
Summary: FCMLA recognition does not work for scalars
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: yyc1992 at gmail dot com
Target Milestone: ---
Tested with the following code with O3, ffp-contract=fast and march=armv8.3-a
and 15.2 as well as trunk.
```
struct Complex {
double real;
double imag;
};
static inline void f(Complex &a, Complex &b, Complex &c)
{
a = {(a.real + b.real * c.real) - b.imag * c.imag,
(a.imag + b.real * c.imag) + b.imag * c.real};
}
void g1(Complex *a, Complex *b, Complex *c, int n)
{
for (int i = 0; i < n; i++) {
f(a[i], b[i], c[i]);
}
}
void g2(Complex *a, Complex *b, Complex *c)
{
f(*a, *b, *c);
}
```
GCC perfectly recognized the pattern for the loop version
```
.L3:
ldr q31, [x2, x4]
ldr q30, [x1, x4]
ldr q29, [x0, x4]
fcmla v29.2d, v30.2d, v31.2d, #0
fcmla v29.2d, v30.2d, v31.2d, #90
str q29, [x0, x4]
add x4, x4, 16
cmp x3, x4
bne .L3
```
but refuses to do so for the scalar version, even though they are doing exactly
the same operations AFAICT,
```
ldp d30, d28, [x2]
ldp d31, d29, [x1]
ldp d27, d26, [x0]
fmadd d27, d31, d30, d27
fmadd d26, d31, d28, d26
fmsub d27, d29, d28, d27
fmadd d26, d30, d29, d26
stp d27, d26, [x0]
```
Maybe related https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121925