https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123260

            Bug ID: 123260
           Summary: FCMLA recognition does not work for scalars
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yyc1992 at gmail dot com
  Target Milestone: ---

Tested with the following code with O3, ffp-contract=fast and march=armv8.3-a
and 15.2 as well as trunk.

```
struct Complex {
    double real;
    double imag;
};

static inline void f(Complex &a, Complex &b, Complex &c)
{
    a = {(a.real + b.real * c.real) - b.imag * c.imag,
        (a.imag + b.real * c.imag) + b.imag * c.real};
}

void g1(Complex *a, Complex *b, Complex *c, int n)
{
    for (int i = 0; i < n; i++) {
        f(a[i], b[i], c[i]);
    }
}

void g2(Complex *a, Complex *b, Complex *c)
{
    f(*a, *b, *c);
}
```

GCC perfectly recognized the pattern for the loop version

```
.L3:
        ldr     q31, [x2, x4]
        ldr     q30, [x1, x4]
        ldr     q29, [x0, x4]
        fcmla   v29.2d, v30.2d, v31.2d, #0
        fcmla   v29.2d, v30.2d, v31.2d, #90
        str     q29, [x0, x4]
        add     x4, x4, 16
        cmp     x3, x4
        bne     .L3
```

but refuses to do so for the scalar version, even though they are doing exactly
the same operations AFAICT,


```
        ldp     d30, d28, [x2]
        ldp     d31, d29, [x1]
        ldp     d27, d26, [x0]
        fmadd   d27, d31, d30, d27
        fmadd   d26, d31, d28, d26
        fmsub   d27, d29, d28, d27
        fmadd   d26, d30, d29, d26
        stp     d27, d26, [x0]
```

Maybe related https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121925

Reply via email to