https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256

--- Comment #58 from amker at gcc dot gnu.org ---
(In reply to Bill Schmidt from comment #56)
> (In reply to Bill Schmidt from comment #53)
> > I'm not a fan of a tree-level unroller.  It's impossible to make good
> > decisions about unroll factors that early.  But your second approach sounds
> > quite promising to me.
> 
> I would be willing to soften this statement.  I think that an early unroller
> might well be a profitable approach for most systems with large caches and
> so forth, where if the unrolling heuristics are not completely accurate we
> are still likely to make a reasonably good decision.  However, I would
> expect to see ports with limited caches/memory to want more accurate control
> over unrolling decisions.  So I could see allowing ports to select between a
> GIMPLE unroller and an RTL unroller (I doubt anybody would want both).

Thanks for the comments.
As David suggested, we can try to implement a relatively conservative unroller
and make sure it's a win in most unrolled cases, even with some opportunities
missed.  Then we can enable it at O3/Ofast level, that would be wanted I think
since now we don't have a general unroller by default.

> 
> In general it seems like PowerPC could benefit from more aggressive
> unrolling much of the time, provided we can also solve the related IVOPTS
> problems that cause too much register spill.
> 
> I may have an interest in working on a GIMPLE unroller, depending on how
> quickly I can complete or shed some other projects...

(In reply to rguent...@suse.de from comment #57)
> On Tue, 11 Aug 2015, wschmidt at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256
> > 
> > --- Comment #56 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
> > (In reply to Bill Schmidt from comment #53)
> > > I'm not a fan of a tree-level unroller.  It's impossible to make good
> > > decisions about unroll factors that early.  But your second approach 
> > > sounds
> > > quite promising to me.
> > 
> > I would be willing to soften this statement.  I think that an early unroller
> > might well be a profitable approach for most systems with large caches and 
> > so
> > forth, where if the unrolling heuristics are not completely accurate we are
> > still likely to make a reasonably good decision.  However, I would expect to
> > see ports with limited caches/memory to want more accurate control over
> > unrolling decisions.  So I could see allowing ports to select between a 
> > GIMPLE
> > unroller and an RTL unroller (I doubt anybody would want both).
> > 
> > In general it seems like PowerPC could benefit from more aggressive 
> > unrolling
> > much of the time, provided we can also solve the related IVOPTS problems 
> > that
> > cause too much register spill.
> > 
> > I may have an interest in working on a GIMPLE unroller, depending on how
> > quickly I can complete or shed some other projects...
> 
> I think that a separate unrolling on GIMPLE would be a hard sell
> due to the lack of a good cost mode.  _But_ doing unrolling as part
> of another transform like we are doing now makes sense.  So does
> eventually moving parts of an RTL pass involving unrolling to
> GIMPLE, like modulo scheduling or SMS (leaving the scheduling part
> to RTL).
(In reply to Bill Schmidt from comment #56)
> (In reply to Bill Schmidt from comment #53)
> > I'm not a fan of a tree-level unroller.  It's impossible to make good
> > decisions about unroll factors that early.  But your second approach sounds
> > quite promising to me.
> 
> I would be willing to soften this statement.  I think that an early unroller
> might well be a profitable approach for most systems with large caches and
> so forth, where if the unrolling heuristics are not completely accurate we
> are still likely to make a reasonably good decision.  However, I would
> expect to see ports with limited caches/memory to want more accurate control
> over unrolling decisions.  So I could see allowing ports to select between a
> GIMPLE unroller and an RTL unroller (I doubt anybody would want both).

As David suggested, we can try to implement a relatively conservative unroller
and make sure it's a win in most unrolled cases, even with some opportunities
missed.  Then we can enable it at O3/Ofast level, it would be nice since we
don't have a general unroller by default.

About cost-model.  Is it possible to introduce cache information model in GCC? 
I don't see it's a difficult problem, and can be a start for possible cache
sensitive optimizations in the future?  Another general question is: what kind
of cost do we need in a fine unroller, besides cache/branch ones?

> 
> In general it seems like PowerPC could benefit from more aggressive
> unrolling much of the time, provided we can also solve the related IVOPTS
> problems that cause too much register spill.
> 
> I may have an interest in working on a GIMPLE unroller, depending on how
> quickly I can complete or shed some other projects...


> 
> Note that the RTL unroller is not enabled by default by any optimization
> level and note that unfortunately the RTL unroller shares flags with
> the GIMPLE level complete peeling (where it mainly controls cost 
> modeling).  Oh, but it's enabled with -fprofile-use.
> 
> It's been a long time since I've done SPEC measuring with/without
> -funroll-loops (or/and -fpeel-loops).  Note that these flags have
> secondary effects as well:
> 
> toplev.c:    flag_web = flag_unroll_loops || flag_peel_loops;
> toplev.c:    flag_rename_registers = flag_unroll_loops || flag_peel_loops;

Reply via email to