https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125431
Bug ID: 125431
Summary: Generated complex-arithmetic code includes FCMLA
instructions even when -ffp-contract=off
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: townsend at astro dot wisc.edu
Target Milestone: ---
Working with gfortran 15.2 on Apple Silicon, I'm finding that complex
arithmetic generates assembly with FCMLA instructions even when
-ffp-contract=off. I'm not sure this is a gfortran issue per se, but I've
encountered it in a Fortran context.
Example code:
---
subroutine foo(a,b,c)
complex :: a(6,6)
complex :: b(6,6)
complex :: c(6,6)
c = MATMUL(a, b)
end subroutine foo
---
Compiling this on godbolt.org (AARCH64 gfortran 15.2.0) with options "-O2
-ffp-contract=off -mcpu=apple-m1" leads to the following assembly:
--
foo_:
stp x29, x30, [sp, -32]!
mov x29, sp
stp x19, x20, [sp, 16]
mov x20, x1
mov x19, x0
mov w1, 0
mov x0, x2
mov x2, 288
bl memset
mov x2, x0
mov x1, x20
add x6, x20, 288
add x5, x19, 288
.L4:
mov x3, x19
mov x4, x1
.L3:
ldr d30, [x4]
mov x0, 0
uzp1 v30.2d, v30.2d, v30.2d
.L2:
ldr q28, [x3, x0]
movi v27.4s, 0
ldr q29, [x2, x0]
fcmla v27.4s, v30.4s, v28.4s, #0
fcmla v27.4s, v30.4s, v28.4s, #90
fadd v27.4s, v27.4s, v29.4s
str q27, [x2, x0]
add x0, x0, 16
cmp x0, 48
bne .L2
add x3, x3, 48
add x4, x4, 8
cmp x3, x5
bne .L3
add x1, x1, 48
add x2, x2, 48
cmp x6, x1
bne .L4
ldp x19, x20, [sp, 16]
ldp x29, x30, [sp], 32
ret
--
The generated code also includes FCMLA instructions with -march=armv8.3-a, but
not with armv8.2-a or earlier (I'm guessing this is because FCMLA was
introduced with armv8.3).
If I rewrite the code using real variables, then FMLA instructions are not
generated with -ffp-contract=off (but they *are* without this flag, as
expected).
This behavior strikes me as inconsistent: -ffp-contract is respected for real
arithmetic, but not for complex arithmetic. I don't know whether it's
purposeful, but it does create ulp-level differences in calculation results,
which isn't ideal.