https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
> 
> --- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #1)
> 
> Thanks for the comments!
> 
> > There's predictive commoning which can do similar transforms and runs after
> > vectorization.  It might be it doesn't handle these "simple" cases or that
> > loop dependence info is not up to the task there.
> > 
> 
> pcom does fix this problem, but it's enabled by default at -O3. Could it be
> considered to be run at O2? Or enabled at O2 at some conditions such as: only
> for one loop which skips loop carried optimization and isn't vectorized
> further?

I think pcom should be enabled when vectorization is due to the 
interaction with PRE.  It could be tamed down (it can do peeling/unrolling 
which is why it is -O3) based on the vectorizer cost model active
if only implicitely enabled ...   Things will get a bit messy I guess.

> > Another option is to avoid the PRE guard with the (very) cheap cost model
> > at the expense of not vectorizing affected loops.
> > 
> 
> OK, I will benchmark this to see its impact. For the particular loops in
> 554.roms_r, they can be vectorized at cheap cost model, this bmk got improved
> at cheap cost model on both Power8 and Power9 (a bit though). So I will just
> test the impact on very cheap cost model.

So another thing to benchmark would be to enable pcom but make sure

  /* Determine the unroll factor, and if the loop should be unrolled, 
ensure
     that its number of iterations is divisible by the factor.  */
  unroll_factor = determine_unroll_factor (chains);
  scev_reset ();
  unroll = (unroll_factor > 1
            && can_unroll_loop_p (loop, unroll_factor, &desc));

is false for the cheap and very-cheap cost models unless
flag_predictive_commoning is active.

It's probably also a good idea to investigate whether the
update_ssa calls in pcom can be delayed to until after all transforms
have been done.

Reply via email to