> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Tuesday, May 13, 2025 12:44 PM > To: Eric Botcazou <botca...@adacore.com> > Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; nd > <n...@arm.com> > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > <requested|preferred> [PR116140] > > On Tue, 13 May 2025, Eric Botcazou wrote: > > > > In PR116140 it was brought up that adding pragma GCC unroll in std::find > > > makes it so that you can't use a larger unroll factor if you wanted to. > > > This is because the value can't be overriden by the other unrolling flags > > > such as -funroll-loops. > > > > What about letting -funroll-loops either augment or use a multiple of the > > specified factor? > > I'm adding my general comment here. While I think it's reasonable > to honor a #pramga unroll during vectorization by trying to adjust > the vectorization factor to the suggested unroll factor, adjusting > the "remaining" (forced) unroll is probably not always desired, > expected or good.
I guess you're referring to the other patch (That's a separate change that I think should be debated there because whatever the vectorizer does is independent of the scalar unroller). I can't think of a case where not adjusting the remaining forced unrolling is a desirable thing? In my opinion the pragma is referring to unrolling of the scalar code, not vector. And if the vectorizer has already unrolled the loop, doing additional unrolling of the vector code is always going to be slow. The larger the unroll factor the more preheader statement GCC generates. If you have e.g. pragma unroll 16 on a SI loop, the vectorizer already unrolles 4 V4SI, for the rtl unroller to then unroll this loop 16 times more, means you have VF requirements of 4x V4SI to 64x V4SI for each loop entry. Surely the user could not have meant that. > > In absence of #pragma unroll the loop unroller has heuristics that > might want to incorporate whether a loop was already unrolled > from original scalar, but the heuristics should work independent > of that. This is especially true in the context of complete > unrolling in cunroll, not so much about the RTL unroller which > lacks any good heuristics. > This isn't true, as it has a target hook for costing. Some targets already have some heuristics to unroll small loops, and I'm planning on doing the same for AArch64 based on the throughput of the loop. > The current #pragma unroll is a force thing originally invented > to guide the RTL unroller when it is disabled (as it is by default). > That it is effectively a "force exact value" is a side-effect of > the lack of any different behavior there (before the #pramga it > would unroll by 8, always). > > IMO there's not enough reason to complicate the tunable, much > less by "weak" attributes like requested vs. preferred. I'd > rather allow > > #pragma GCC unroll > > without a specific unroll factor to suggest GCC should enable > unrolling for this loop, but according to heuristics, rather > than to a fixed amount (that would be your "preferred" I guess). The reason for the extra keyword is to *still* get the requested unrolling when -funroll-loops is not specified. With your suggestion the user could never specify a default unroll factor for a loop for when `-funroll-loops` is not used. i.e. #pragma GCC unroll And #pragma GCC unroll 4 preferred Are not the same without -funroll-loops and that's the difference this change is trying to realize. Thanks, Tamar > > Richard.