[Bug fortran/123198] New: Unnecessary vperms when vectorising with stride of -1

mjr19 at cam dot ac.uk via Gcc-bugs Thu, 18 Dec 2025 04:56:39 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123198


            Bug ID: 123198
           Summary: Unnecessary vperms when vectorising with stride of -1
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mjr19 at cam dot ac.uk
  Target Milestone: ---

subroutine foo(a,b,n)
  real(kind(1d0))::a(*),b(*)
  integer::i,n

  do i=n,1,-1
     a(i)=a(i)+b(i)
  end do
end subroutine foo

compiles with gfortran-15 -O3 -march=core-avx2 to a main loop of

.L5:
        vpermpd $27, (%r8,%rax), %ymm0
        vpermpd $27, (%r9,%rax), %ymm1
        vaddpd  %ymm1, %ymm0, %ymm0
        vpermpd $27, %ymm0, %ymm0
        vmovupd %ymm0, (%r8,%rax)
        subq    $32, %rax
        cmpq    %rax, %rsi
        jne     .L5

The vpermpd instructions, which reverse the order of the elements in
the vector register before the addition, then unreverse them
afterwards, seem unnecessary. Should not the first two be plain vmovupd,
and the third be eliminated?

The same issue affects

  do concurrent (i=1:n)

vs

  do concurrent (i=n:1:-1)

for the same loop body, in which it is clearer that order does not matter.

[Bug fortran/123198] New: Unnecessary vperms when vectorising with stride of -1

Reply via email to