https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85057

            Bug ID: 85057
           Summary: GCC fails to vectorize code unless dummy loop is added
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mokreutzer at gmail dot com
  Target Milestone: ---

I have a class which represents short vectors (1D, 2D, 3D) and does numeric
computations using the expression template engine PETE[1]. The attached example
is stripped down to support only 1D vectors, which is the simplest case but
still demonstrates the issue. In my application, vector computations are
executed in a loop which is subject to vectorization, as in:

 int const N = 100000;
 Vector<1, double> a[N];
 // initialize a 
 for (int i=0; i<N; i++)
   a[i] = 0.5*a[i];

The PETE machinery causes each loop iteration to evaluate an expression in a
function evaluate(), which (for 1D vectors) looks like this:

 template <int N, typename T, typename Op, typename RHS>
 inline void evaluate(Vector<N,T> &lhs, Op const &op, Expression<RHS> const
&rhs)
 {
     op(lhs(0), forEach(rhs, EvalVectorLeaf<N>(0), OpCombine()));
 }

The issue is that GCC is not able to vectorize above loop, i.e., the assembly
code of the loop body is "vmulsd  xmm0, xmm1, QWORD PTR [rax]". However, and
now comes the crux, GCC can vectorize the loop ("vmulpd  ymm0, ymm1, YMMWORD
PTR [rax]") if I add a seemingly meaningless dummy loop to the funtion body, as
in:

 template <int N, typename T, typename Op, typename RHS>
 inline void evaluate(Vector<N,T> &lhs, Op const &op, Expression<RHS> const
&rhs)
 {
   for (int i=0; i<1; i++)
     op(lhs(i), forEach(rhs, EvalVectorLeaf<N>(i), OpCombine()));
 }

Attached is the code which does not vectorize. A vectorizing version can easily
be constructed by adding the loop as shown above.


g++ command line: g++ -O3 -mavx
System type: x86_64-pc-linux-gnu


[1]: The official website of PETE seems to be gone, but a mirror can be found
here: https://github.com/erdc/daetk/tree/master/pete/pete-2.1.0

Reply via email to