[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

rguenther at suse dot de via Gcc-bugs Fri, 28 May 2021 05:24:00 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794


--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
> 
> --- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> ---
> (In reply to rguent...@suse.de from comment #3)
> > On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
> > > 
> > > --- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
> > > (In reply to Richard Biener from comment #1)
> > > 
> > > Thanks for the comments!
> > > 
> > > > There's predictive commoning which can do similar transforms and runs 
> > > > after
> > > > vectorization.  It might be it doesn't handle these "simple" cases or 
> > > > that
> > > > loop dependence info is not up to the task there.
> > > > 
> > > 
> > > pcom does fix this problem, but it's enabled by default at -O3. Could it 
> > > be
> > > considered to be run at O2? Or enabled at O2 at some conditions such as: 
> > > only
> > > for one loop which skips loop carried optimization and isn't vectorized
> > > further?
> > 
> > I think pcom should be enabled when vectorization is due to the 
> > interaction with PRE.  It could be tamed down (it can do peeling/unrolling 
> > which is why it is -O3) based on the vectorizer cost model active
> > if only implicitely enabled ...   Things will get a bit messy I guess.
> > 
> 
> Good point! I prefer this idea to the one guarding cost model in sccvn code. 
> 
> > > > Another option is to avoid the PRE guard with the (very) cheap cost 
> > > > model
> > > > at the expense of not vectorizing affected loops.
> > > > 
> > > 
> > > OK, I will benchmark this to see its impact. For the particular loops in
> > > 554.roms_r, they can be vectorized at cheap cost model, this bmk got 
> > > improved
> > > at cheap cost model on both Power8 and Power9 (a bit though). So I will 
> > > just
> > > test the impact on very cheap cost model.
> > 
> > So another thing to benchmark would be to enable pcom but make sure
> > 
> >   /* Determine the unroll factor, and if the loop should be unrolled, 
> > ensure
> >      that its number of iterations is divisible by the factor.  */
> >   unroll_factor = determine_unroll_factor (chains);
> >   scev_reset ();
> >   unroll = (unroll_factor > 1
> >             && can_unroll_loop_p (loop, unroll_factor, &desc));
> > 
> > is false for the cheap and very-cheap cost models unless
> > flag_predictive_commoning is active.
> > 
> 
> Thanks for the hints! One question: could we just enable non-unroll 
> version of pcom if it's enabled by flag_tree_loop_vectorize implicitly 
> without considering vect cost model? Although the very-cheap and cheap 
> cost model are very likely associated to O2, users still can try dynamic 
> or unlimited cost model at O2, or very-cheap/cheap cost model at O3, it 
> seems not good to map cost model onto unroll decision here directly. Or 
> maybe we check the optimization level? such as:
> 
>   virtual bool
>   gate (function *)
>   {
>     if (flag_predictive_commoning != 0)
>       return true;
>     if (flag_tree_loop_vectorize)
>       {
>         allow_unroll_p = optimize > 2;

But what about -O2 -ftree-vectorize -fno-predictive-commoning?  IMHO
we want to check global_options_set and for "implicit" pcom do not
allow unrolling and for explicitely disabled pcom do not do any
pcom at all.

Richard.

[Bug tree-optimization/100794] suboptimal code due to missing pre2 when vectorization fails

Reply via email to