https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124083
Bug ID: 124083
Summary: [14/15/16 Regression] Early ra causes an extra move
from fpr to gpr in some cases
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
Take:
```
#define vector16 __attribute__((vector_size(16)))
typedef vector16 long v2l;
v2l c;
long f(long a, long b)
{
c = (v2l){a,b};
for(int i = 0; i < 10; i++)
c+=(v2l){1,2};
return a;
}
```
With -O2 GCC produces:
```
fmov d31, x0
fmov d30, x1
adrp x1, .LC0
mov w0, 10
uzp1 v30.2d, v31.2d, v30.2d
ldr q29, [x1, #:lo12:.LC0]
.L2:
subs w0, w0, #1
add v30.2d, v30.2d, v29.2d
bne .L2
adrp x0, .LANCHOR0
str q30, [x0, #:lo12:.LANCHOR0]
fmov x0, d31
ret
```
Notice the last fmov here. That is NOT needed because GCC could just use a GPR
for that originally.
If we use `-O2 -mearly-ra=none` GCC produces:
```
f:
adrp x2, .LC0
fmov d31, x0
ldr q30, [x2, #:lo12:.LC0]
ins v31.d[1], x1
mov w1, 10
.L2:
add v31.2d, v31.2d, v30.2d
subs w1, w1, #1
bne .L2
adrp x1, .LANCHOR0
str q31, [x1, #:lo12:.LANCHOR0]
ret
```
That is so much better code there is no extra move either.
And yes this shows up in code, this was reduced from lld from PR 121495.