https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123258

            Bug ID: 123258
           Summary: Multiplication by -1 not hoisted
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mjr19 at cam dot ac.uk
  Target Milestone: ---

subroutine foo(a,b,n)
  double precision::a(*),b(*)
  integer::i,n

  do i=1,n,2
     a(i)=a(i)*b(1)
     a(i+1)=a(i+1)*b(2)
  end do
end subroutine foo

compiled with gfortran-15 -Ofast -march=core-avx2 produces what appears to
be optimal code. One ymm register is filled with two copies of b, and a
packed ymm multiplication advances two iterations of the loop.

If one tries to complicate the loop a little

  do i=1,n,2
     a(i)=a(i)*2.3*b(1)
     a(i+1)=a(i+1)*3.4*b(2)
  end do

the optimisation is unaffected. But if one constant is -1, whether

 a(i+1)=a(i+1)*(-1.0)*-b(2)

or

 a(i+1)=a(i+1)*(-b(2))

something quite unoptimal is produced. The compiler decides to
multiply by -1 using an xor *within the loop body*, and uses an
impressive number of vperms, vunpcks and vshufs to achieve this. This
special case "optimisation" for -1 makes the code run at about half
speed.

If both compile-time constants are -1

  do i=1,n,2
     a(i)=a(i)*(-b(1))
     a(i+1)=a(i+1)*(-b(2))
  end do

the multiplication by -1 is still done with an xor within the loop
body, although in this case it is free from vperms etc. so the
performance penalty is small.

I'd suggest that multiplications by -1 should be considered for hoisting.

Reply via email to