--- Comment #9 from Pat Haugen <pthaugen at gcc dot> ---
(In reply to Martin Jambor from comment #7)
> Do I understand it correctly that you suspect that the new IPA-CP
> clone that is created from r256888 on is harmful?  In that case, you
> want to test that by trying higher values of ipa-cp-eval-threshold,
> something like --param ipa-cp-eval-threshold 610 (i.e. bigger than
> 606).  Of course, if there are other clones with evaluations between
> 500 and 610, it would affect them too.

Building with --param ipa-cp-eval-threshold=610 prevented the creation of the
.resid_.constprop.1 clone and eliminated the performance degradation.

Looking at the profile more in depth, I saw that most of the time in
resid_.constprop was spent in the main vectorized loop. I tried both revisions
with -fno-tree-vectorize to see if vectorization in the clone is the real
problem on powerpc, but ran into issues with output miscompare (pr83497, which
I'm still digging on). Ignoring output miscompare and just timing the two
versions built with -fno-tree-vectorize, I see that the  performance is
similar. So possibly a powerpc vector cost issue.

> You may also want to try both fast and slow revisions with
> -fno-ipa-cp-clone as the first step, actually.

Doing this showed r256888 about 4% slower, so not near as bad.

Reply via email to