https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794
--- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #3) > On Fri, 28 May 2021, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100794 > > > > --- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> --- > > (In reply to Richard Biener from comment #1) > > > > Thanks for the comments! > > > > > There's predictive commoning which can do similar transforms and runs > > > after > > > vectorization. It might be it doesn't handle these "simple" cases or that > > > loop dependence info is not up to the task there. > > > > > > > pcom does fix this problem, but it's enabled by default at -O3. Could it be > > considered to be run at O2? Or enabled at O2 at some conditions such as: > > only > > for one loop which skips loop carried optimization and isn't vectorized > > further? > > I think pcom should be enabled when vectorization is due to the > interaction with PRE. It could be tamed down (it can do peeling/unrolling > which is why it is -O3) based on the vectorizer cost model active > if only implicitely enabled ... Things will get a bit messy I guess. > Good point! I prefer this idea to the one guarding cost model in sccvn code. > > > Another option is to avoid the PRE guard with the (very) cheap cost model > > > at the expense of not vectorizing affected loops. > > > > > > > OK, I will benchmark this to see its impact. For the particular loops in > > 554.roms_r, they can be vectorized at cheap cost model, this bmk got > > improved > > at cheap cost model on both Power8 and Power9 (a bit though). So I will just > > test the impact on very cheap cost model. > > So another thing to benchmark would be to enable pcom but make sure > > /* Determine the unroll factor, and if the loop should be unrolled, > ensure > that its number of iterations is divisible by the factor. */ > unroll_factor = determine_unroll_factor (chains); > scev_reset (); > unroll = (unroll_factor > 1 > && can_unroll_loop_p (loop, unroll_factor, &desc)); > > is false for the cheap and very-cheap cost models unless > flag_predictive_commoning is active. > Thanks for the hints! One question: could we just enable non-unroll version of pcom if it's enabled by flag_tree_loop_vectorize implicitly without considering vect cost model? Although the very-cheap and cheap cost model are very likely associated to O2, users still can try dynamic or unlimited cost model at O2, or very-cheap/cheap cost model at O3, it seems not good to map cost model onto unroll decision here directly. Or maybe we check the optimization level? such as: virtual bool gate (function *) { if (flag_predictive_commoning != 0) return true; if (flag_tree_loop_vectorize) { allow_unroll_p = optimize > 2; return true; } return false; } > It's probably also a good idea to investigate whether the > update_ssa calls in pcom can be delayed to until after all transforms > have been done. OK, will check this later.