https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 > > --- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> --- > (In reply to rguent...@suse.de from comment #3) > > On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 > > > > > > --- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> --- > > > (In reply to Richard Biener from comment #1) > > > > > > Thanks for the comments! > > > > > > > There's predictive commoning which can do similar transforms and runs > > > > after > > > > vectorization. It might be it doesn't handle these "simple" cases or > > > > that > > > > loop dependence info is not up to the task there. > > > > > > > > > > pcom does fix this problem, but it's enabled by default at -O3. Could it > > > be > > > considered to be run at O2? Or enabled at O2 at some conditions such as: > > > only > > > for one loop which skips loop carried optimization and isn't vectorized > > > further? > > > > I think pcom should be enabled when vectorization is due to the > > interaction with PRE. It could be tamed down (it can do peeling/unrolling > > which is why it is -O3) based on the vectorizer cost model active > > if only implicitely enabled ... Things will get a bit messy I guess. > > > > Good point! I prefer this idea to the one guarding cost model in sccvn code. > > > > > Another option is to avoid the PRE guard with the (very) cheap cost > > > > model > > > > at the expense of not vectorizing affected loops. > > > > > > > > > > OK, I will benchmark this to see its impact. For the particular loops in > > > 554.roms_r, they can be vectorized at cheap cost model, this bmk got > > > improved > > > at cheap cost model on both Power8 and Power9 (a bit though). So I will > > > just > > > test the impact on very cheap cost model. > > > > So another thing to benchmark would be to enable pcom but make sure > > > > /* Determine the unroll factor, and if the loop should be unrolled, > > ensure > > that its number of iterations is divisible by the factor. */ > > unroll_factor = determine_unroll_factor (chains); > > scev_reset (); > > unroll = (unroll_factor > 1 > > && can_unroll_loop_p (loop, unroll_factor, &desc)); > > > > is false for the cheap and very-cheap cost models unless > > flag_predictive_commoning is active. > > > > Thanks for the hints! One question: could we just enable non-unroll > version of pcom if it's enabled by flag_tree_loop_vectorize implicitly > without considering vect cost model? Although the very-cheap and cheap > cost model are very likely associated to O2, users still can try dynamic > or unlimited cost model at O2, or very-cheap/cheap cost model at O3, it > seems not good to map cost model onto unroll decision here directly. Or > maybe we check the optimization level? such as: > > virtual bool > gate (function *) > { > if (flag_predictive_commoning != 0) > return true; > if (flag_tree_loop_vectorize) > { > allow_unroll_p = optimize > 2; But what about -O2 -ftree-vectorize -fno-predictive-commoning? IMHO we want to check global_options_set and for "implicit" pcom do not allow unrolling and for explicitely disabled pcom do not do any pcom at all. Richard.