https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123258
Bug ID: 123258
Summary: Multiplication by -1 not hoisted
Product: gcc
Version: 15.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: mjr19 at cam dot ac.uk
Target Milestone: ---
subroutine foo(a,b,n)
double precision::a(*),b(*)
integer::i,n
do i=1,n,2
a(i)=a(i)*b(1)
a(i+1)=a(i+1)*b(2)
end do
end subroutine foo
compiled with gfortran-15 -Ofast -march=core-avx2 produces what appears to
be optimal code. One ymm register is filled with two copies of b, and a
packed ymm multiplication advances two iterations of the loop.
If one tries to complicate the loop a little
do i=1,n,2
a(i)=a(i)*2.3*b(1)
a(i+1)=a(i+1)*3.4*b(2)
end do
the optimisation is unaffected. But if one constant is -1, whether
a(i+1)=a(i+1)*(-1.0)*-b(2)
or
a(i+1)=a(i+1)*(-b(2))
something quite unoptimal is produced. The compiler decides to
multiply by -1 using an xor *within the loop body*, and uses an
impressive number of vperms, vunpcks and vshufs to achieve this. This
special case "optimisation" for -1 makes the code run at about half
speed.
If both compile-time constants are -1
do i=1,n,2
a(i)=a(i)*(-b(1))
a(i+1)=a(i+1)*(-b(2))
end do
the multiplication by -1 is still done with an xor within the loop
body, although in this case it is free from vperms etc. so the
performance penalty is small.
I'd suggest that multiplications by -1 should be considered for hoisting.