16 Regression] Early ra causes an extra move from fpr to gpr in some cases

pinskia at gcc dot gnu.org via Gcc-bugs Thu, 12 Feb 2026 15:21:53 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124083


            Bug ID: 124083
           Summary: [14/15/16 Regression] Early ra causes an extra move
                    from fpr to gpr in some cases
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
#define vector16 __attribute__((vector_size(16)))
typedef vector16 long v2l;
v2l c;

long f(long a, long b)
{
  c = (v2l){a,b};
  for(int i = 0; i < 10; i++)
    c+=(v2l){1,2};
  return a;
}
```

With -O2 GCC produces:
```
        fmov    d31, x0
        fmov    d30, x1
        adrp    x1, .LC0
        mov     w0, 10
        uzp1    v30.2d, v31.2d, v30.2d
        ldr     q29, [x1, #:lo12:.LC0]
.L2:
        subs    w0, w0, #1
        add     v30.2d, v30.2d, v29.2d
        bne     .L2
        adrp    x0, .LANCHOR0
        str     q30, [x0, #:lo12:.LANCHOR0]
        fmov    x0, d31
        ret
```

Notice the last fmov here. That is NOT needed because GCC could just use a GPR
for that originally.

If we use `-O2 -mearly-ra=none` GCC produces:
```
f:
        adrp    x2, .LC0
        fmov    d31, x0
        ldr     q30, [x2, #:lo12:.LC0]
        ins     v31.d[1], x1
        mov     w1, 10
.L2:
        add     v31.2d, v31.2d, v30.2d
        subs    w1, w1, #1
        bne     .L2
        adrp    x1, .LANCHOR0
        str     q31, [x1, #:lo12:.LANCHOR0]
        ret
```

That is so much better code there is no extra move either.
And yes this shows up in code, this was reduced from lld from PR 121495.

[Bug target/124083] New: [14/15/16 Regression] Early ra causes an extra move from fpr to gpr in some cases

Reply via email to