RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

Tamar Christina Tue, 13 May 2025 05:13:16 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Tuesday, May 13, 2025 12:44 PM
> To: Eric Botcazou <botca...@adacore.com>
> Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; nd
> <n...@arm.com>
> Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n
> <requested|preferred> [PR116140]
> 
> On Tue, 13 May 2025, Eric Botcazou wrote:
> 
> > > In PR116140 it was brought up that adding pragma GCC unroll in std::find
> > > makes it so that you can't use a larger unroll factor if you wanted to.
> > > This is because the value can't be overriden by the other unrolling flags
> > > such as -funroll-loops.
> >
> > What about letting -funroll-loops either augment or use a multiple of the
> > specified factor?
> 
> I'm adding my general comment here.  While I think it's reasonable
> to honor a #pramga unroll during vectorization by trying to adjust
> the vectorization factor to the suggested unroll factor, adjusting
> the "remaining" (forced) unroll is probably not always desired,
> expected or good.


I guess you're referring to the other patch (That's a separate change that I
think should be debated there because whatever the vectorizer does is
independent of the scalar unroller).  I can't think of a case where
not adjusting the remaining forced unrolling is a desirable thing?

In my opinion the pragma is referring to unrolling of the scalar code, not
vector.  And if the vectorizer has already unrolled the loop, doing additional
unrolling of the vector code is always going to be slow.

The larger the unroll factor the more preheader statement GCC generates.
If you have e.g. pragma unroll 16 on a SI loop, the vectorizer already unrolles
4 V4SI, for the rtl unroller to then unroll this loop 16 times more, means you
have VF requirements of 4x V4SI to 64x V4SI for each loop entry.  Surely the 
user
could not have meant that.

> 
> In absence of #pragma unroll the loop unroller has heuristics that
> might want to incorporate whether a loop was already unrolled
> from original scalar, but the heuristics should work independent
> of that.  This is especially true in the context of complete
> unrolling in cunroll, not so much about the RTL unroller which
> lacks any good heuristics.
> 

This isn't true, as it has a target hook for costing. Some targets
already have some heuristics to unroll small loops, and I'm planning on
doing the same for AArch64 based on the throughput of the loop.

> The current #pragma unroll is a force thing originally invented
> to guide the RTL unroller when it is disabled (as it is by default).
> That it is effectively a "force exact value" is a side-effect of
> the lack of any different behavior there (before the #pramga it
> would unroll by 8, always).
> 
> IMO there's not enough reason to complicate the tunable, much
> less by "weak" attributes like requested vs. preferred.  I'd
> rather allow
> 
> #pragma GCC unroll
> 
> without a specific unroll factor to suggest GCC should enable
> unrolling for this loop, but according to heuristics, rather
> than to a fixed amount (that would be your "preferred" I guess).

The reason for the extra keyword is to *still* get the requested unrolling
when -funroll-loops is not specified.

With your suggestion the user could never specify a default unroll factor
for a loop for when `-funroll-loops` is not used.

i.e.

#pragma GCC unroll
And 
#pragma GCC unroll 4 preferred

Are not the same without -funroll-loops and that's the difference this change
is trying to realize.

Thanks,
Tamar

> 
> Richard.

RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

Reply via email to