> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Tuesday, May 13, 2025 1:36 PM > To: Jakub Jelinek <ja...@redhat.com> > Cc: Tamar Christina <tamar.christ...@arm.com>; Jonathan Wakely > <jwak...@redhat.com>; gcc-patches@gcc.gnu.org; nd <n...@arm.com> > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > <requested|preferred> [PR116140] > > On Tue, 13 May 2025, Jakub Jelinek wrote: > > > On Tue, May 13, 2025 at 10:40:16AM +0000, Tamar Christina wrote: > > > That's true. The names are already optional, I can just drop the > > > "requested" > > > all together. > > > > > > I'll give it a few to give others a chance to commit and I'll respin > > > dropping > "requested" > > > > Is the intended behavior of the "weak" version that the compiler can > > increase or decrease it based on command line options etc., or that it > > must unroll at least N times but with command line options etc. it could > > be something higher than that? > > > > Perhaps > > #pragma GCC unroll 16 > > vs. > > #pragma GCC unroll >= 16 > > or > > #pragma GCC unroll 16+ > > ? > > As for keywords, I was worried about macros, but seems GCC unroll pragma > > doesn't have macro expansion in the name nor arguments part, so when one > > wants to macro expand the count, one needs to use _Pragma and create the > > right expression as string literal. > > I think the intent for the given case is that GCC unrolls the loop, > but not as much as with -funroll-loops (factor 8 IIRC). But when > vectorizing then the unroll request is satisfied already (given > vectorization effectively unrolls). > > IMO it should be possible to just use > > #pramga GCC unroll > > for this. That does't do the limiting to 4 times unrolling, but leaves > it to the (non-existent) cost modeling of the RTL unroller. > > I think we should avoid to overengineer this for PR116140 > which is just a case where we do _not_ want further unrolling > after vectorization.
This particular patch is a case where the user may want more scalar unrolling (has no bearing on the vector patch). The comment was that before with the hand unrolled loop, -funroll-loops could be used to override this. Unrolling by larger amounts is not free. The pre-header becomes more expensive. And such unrolling more only makes sense *if* you micro-architecture can actually do better on it. This would be bad on e.g. inorder cores. That's presumably why std::find unrolled by default only 4x as it made more sense. Especially if used within a loop. Without this patch, we can't have a good default, but allow users to override it. Again, has nothing to do with vector at all. Thanks, Tamar > > Richard.