https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124083

            Bug ID: 124083
           Summary: [14/15/16 Regression] Early ra causes an extra move
                    from fpr to gpr in some cases
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
#define vector16 __attribute__((vector_size(16)))
typedef vector16 long v2l;
v2l c;

long f(long a, long b)
{
  c = (v2l){a,b};
  for(int i = 0; i < 10; i++)
    c+=(v2l){1,2};
  return a;
}
```

With -O2 GCC produces:
```
        fmov    d31, x0
        fmov    d30, x1
        adrp    x1, .LC0
        mov     w0, 10
        uzp1    v30.2d, v31.2d, v30.2d
        ldr     q29, [x1, #:lo12:.LC0]
.L2:
        subs    w0, w0, #1
        add     v30.2d, v30.2d, v29.2d
        bne     .L2
        adrp    x0, .LANCHOR0
        str     q30, [x0, #:lo12:.LANCHOR0]
        fmov    x0, d31
        ret
```

Notice the last fmov here. That is NOT needed because GCC could just use a GPR
for that originally.

If we use `-O2 -mearly-ra=none` GCC produces:
```
f:
        adrp    x2, .LC0
        fmov    d31, x0
        ldr     q30, [x2, #:lo12:.LC0]
        ins     v31.d[1], x1
        mov     w1, 10
.L2:
        add     v31.2d, v31.2d, v30.2d
        subs    w1, w1, #1
        bne     .L2
        adrp    x1, .LANCHOR0
        str     q31, [x1, #:lo12:.LANCHOR0]
        ret
```

That is so much better code there is no extra move either.
And yes this shows up in code, this was reduced from lld from PR 121495.

Reply via email to