Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

Roman Zhuykov Tue, 11 Feb 2020 04:46:18 -0800

11.02.2020 11:01, Richard Biener wrote:
> On Tue, 11 Feb 2020, Segher Boessenkool wrote:
>
>> On Tue, Feb 11, 2020 at 08:34:15AM +0100, Richard Biener wrote:
>>> On Mon, 10 Feb 2020, Segher Boessenkool wrote:
>>>> Yes, we should decide how often we want to unroll things somewhere before
>>>> ivopts already, and just use that info here.
>>>>
>>>> Or are there advantage to doing it *in* ivopts?  It sounds like doing
>>>> it there is probably expensive, but maybe not, and we need to do similar
>>>> analysis there anyway.
>>> Well, if the only benefit of doing the unrolling is that IVs get
>>> cheaper then yes, IVOPTs should drive it.
>> We need to know much earlier in the pass pipeline how often a loop will
>> be unrolled.  We don't have to *do* it early.
>>
>> If we want to know it before ivopts, then obviously it has to be done
>> earlier.  Otherwise, maybe it is a good idea to do it in ivopts itself.
>> Or maybe not.  It's just an idea :-)
>>
>> We know we do not want it *later*, ivopts needs to know this to make
>> good decisions of its own.
>>
>>> But usually unrolling exposes redundancies (catched by predictive
>>> commoning which drives some unrolling) or it enables better use
>>> of CPU resources via scheduling (only catched later in RTL).
>>> For scheduling we have the additional complication that the RTL
>>> side doesn't have as much of a fancy data dependence analysis
>>> framework as on the GIMPLE side.  So I'd put my bet on trying to
>>> move something like SMS to GIMPLE and combine it with unrolling
>>> (IIRC SMS at most interleaves 1 1/2 loop iterations).
To clarify, without specifying -fmodulo-sched-allow-regmoves it only
interleaves 2 iterations.  With register moves enabled more iterations
can be considered.
> SMS on RTL always was quite disappointing...
Hmm, even when trying to move it just few passes earlier many years ago,
got another opinion:
https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html
Although without such a move we still have annoying issues which RTL
folks can't solve, see e.q.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264#c2
> It originally came with "data dependence export from GIMPLE to RTL"
> that never materialized so I'm not surprised ;)  It also relies
> on doloop detection.
My current attempt to drop doloop dependency is still WIP, hopefully
I'll create branch in refs/users/ in a month or so.  But older (gcc-7
and earlier) versions are available, see
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html
Doloops are still supported for some kind of backward compatibility, but
much more loops (which loop-iv can analyze) are considered in new SMS.
>> Do you expect it will be more useful on Gimple?  Moving it there is a 
>> good idea in any case ;-)
>>
>> I don't quite see the synergy between SMS and loop unrolling, but maybe
>> I need to look harder.
> As said elsewhere I don't believe in actual unrolling doing much good
> but in removing data dependences in the CPU pipeline.  SMS rotates
> the loop, peeling N iterations (and somehow I think for N > 1 that
> should better mean unrolling the loop body).
Yes, this is what theory tells us.
> Of course doing "scheduling" on GIMPLE is "interesting" in its own
> but OTOH our pipeline DFAs are imprecise enough that one could even
> devise some basic GIMPLE <-> "RTL" mapping to make use of it.  But
> then scheduling without IVs or register pressure in mind is somewhat
> pointless as well.
Unfortunately, even with -fmodulo-sched-allow-regmoves it doesn't
interact much with register pressure.
> That said - if I had enough time I'd still thing that investigating
> "scheduling on GIMPLE" as replacement for sched1 is an interesting
> thing to do.
Sound good, but IMHO modulo scheduler is not the best choice to be the
first step implementing such a concept.


Roman

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

Reply via email to