https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122408
Bug ID: 122408
Summary: [Aarch64] Wrong auto-vectorization for complex AXPY
with conjugation (CONJG)
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: gcc-bugzilla at slmaertens dot dev
Target Milestone: ---
Created attachment 62626
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62626&action=edit
Reproducer for auto-vectorization bug.
Since gfortran 15.2.0, the compiler auto-vectorizes the following loop by using
the fcmla instruction:
DO I = 1, LASTC
C( 1, I ) = C( 1, I ) - TAU * DCONJG( WORK( I ) )
END DO
Without parentheses around DCONJG:
L20:
movi v6.4s, 0
ldr q7, [x2], 16
ldr q16, [x19]
fcmla v6.2d, v18.2d, v7.2d, #0
fcmla v6.2d, v18.2d, v7.2d, #90
fsub v6.2d, v16.2d, v6.2d
str q6, [x19]
add x19, x19, x8
cmp x2, x5
bne L20
With parentheses around DCONJG:
L20:
ldp d6, d5, [x0], 16
ldp d16, d7, [x19, 16]
fmul d4, d17, d6
fnmul d3, d5, d17
fmsub d4, d5, d18, d4
fnmsub d3, d18, d6, d3
fsub d4, d7, d4
fsub d3, d16, d3
stp d3, d4, [x19, 16]
add x19, x19, x8
cmp x0, x5
bne L20
The auto-vectorized code misses the complex conjugation.
[email protected] already figured out what is going on here:
https://github.com/iains/gcc-darwin-arm64/issues/148#issuecomment-3438953784
This should come up on all aarch64 machines, but I can currently reproduce it
only on macOS. The reproducer is attached (compile with -O2).