[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

linkw at gcc dot gnu.org via Gcc-bugs Fri, 28 May 2021 04:30:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794


--- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #3)
> On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
> > 
> > --- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #1)
> > 
> > Thanks for the comments!
> > 
> > > There's predictive commoning which can do similar transforms and runs 
> > > after
> > > vectorization.  It might be it doesn't handle these "simple" cases or that
> > > loop dependence info is not up to the task there.
> > > 
> > 
> > pcom does fix this problem, but it's enabled by default at -O3. Could it be
> > considered to be run at O2? Or enabled at O2 at some conditions such as: 
> > only
> > for one loop which skips loop carried optimization and isn't vectorized
> > further?
> 
> I think pcom should be enabled when vectorization is due to the 
> interaction with PRE.  It could be tamed down (it can do peeling/unrolling 
> which is why it is -O3) based on the vectorizer cost model active
> if only implicitely enabled ...   Things will get a bit messy I guess.
> 

Good point! I prefer this idea to the one guarding cost model in sccvn code. 

> > > Another option is to avoid the PRE guard with the (very) cheap cost model
> > > at the expense of not vectorizing affected loops.
> > > 
> > 
> > OK, I will benchmark this to see its impact. For the particular loops in
> > 554.roms_r, they can be vectorized at cheap cost model, this bmk got 
> > improved
> > at cheap cost model on both Power8 and Power9 (a bit though). So I will just
> > test the impact on very cheap cost model.
> 
> So another thing to benchmark would be to enable pcom but make sure
> 
>   /* Determine the unroll factor, and if the loop should be unrolled, 
> ensure
>      that its number of iterations is divisible by the factor.  */
>   unroll_factor = determine_unroll_factor (chains);
>   scev_reset ();
>   unroll = (unroll_factor > 1
>             && can_unroll_loop_p (loop, unroll_factor, &desc));
> 
> is false for the cheap and very-cheap cost models unless
> flag_predictive_commoning is active.
> 

Thanks for the hints! One question: could we just enable non-unroll version of
pcom if it's enabled by flag_tree_loop_vectorize implicitly without considering
vect cost model? Although the very-cheap and cheap cost model are very likely
associated to O2, users still can try dynamic or unlimited cost model at O2, or
very-cheap/cheap cost model at O3, it seems not good to map cost model onto
unroll decision here directly. Or maybe we check the optimization level? such
as:

  virtual bool
  gate (function *)
  {
    if (flag_predictive_commoning != 0)
      return true;
    if (flag_tree_loop_vectorize)
      {
        allow_unroll_p = optimize > 2;
        return true;
      }
    return false;
  }


> It's probably also a good idea to investigate whether the
> update_ssa calls in pcom can be delayed to until after all transforms
> have been done.

OK, will check this later.

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

Reply via email to