------- Comment #4 from rguenther at suse dot de 2009-11-19 17:30 -------
Subject: Re: [4.4/4.5 Regression] Vectorizer
cannot deal with PAREN_EXPR gracefully, 50% performance regression
On Thu, 19 Nov 2009, sfilippone at uniroma2 dot it wrote:
> ------- Comment #3 from sfilippone at uniroma2 dot it 2009-11-19 17:17
> -------
> (In reply to comment #2)
> > -ftree-vectorizer-verbose=2 tells you:
> >
> > eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 =
> > ((D.1683_72));
> >
> > eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 =
> > ((D.1683_57));
> >
> > PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off
> > right now.
> >
> > The loops are
> >
> > do i=1,nnd
> > x(i) = 1.d0 + (1.d0*i)/nnd
> > end do
> > do i=1,n
> > foo4(i) = 1.d0 + (1.d0*i)/n
> > end do
> >
> > where the vectorizer doesn't know how to ensure evaluation order is
> > preserved when trying to vectorize (1.d0*i)/n. Writing them as
> > 1.d0*i/n vectorizes the function.
> >
> > Still the performance is lower by a factor of two compared to 4.3
> > (even with -ffast-math).
> >
> > Probably the bug should be split.
> >
>
> Well, the performance drop I am looking at is in the subroutine. The
> initialization loops are (to me) irrelevant, I had posted a previous version
> to the mailing list where the initialization was done with random_number and
> the situation was the same.
> A run with profiling shows that more than 99% of the time is spent in eval_
Heh, with -fwhole-program GCC optimizes the test away and I get 0.0s
runtime.
Well, within eval there's nothing really obvious to me. The
innermost loop is exactly the same:
.L39:
movsd (%r15), %xmm0
addq %rsi, %r15
subsd (%rdx), %xmm0
addq %rsi, %rdx
subl $1, %eax
mulsd %xmm0, %xmm0
addsd %xmm0, %xmm1
jne .L39
the next outer loop has some less loads in 4.5 but also different
induction variables. So - nothing obvious to me.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108