On Fri, Dec 02, 2016 at 09:47:00AM +0100, Richard Biener wrote:
> >> STC tries to make as large as possible consecutive "traces", mainly to
> >> help with instruction cache utilization and hit rate etc. It cannot do
> >> a very good job if it isn't allowed to copy blocks.
> >>
> >> "simple" tries to (dynamically) have as many fall-throughs as possible,
> >> i.e. as few jumps as possible.
>
> I hope it special-cases backedges, that is, not make those fallthru.
It doesn't, and that is a good thing.
Firstly, what is classified as backedge doesn't make too much sense in
some cases; quite often the cfg isn't reducible.
Secondly, for simple loops it does not matter: the backedge has lower
frequency than the forward edges, so everything works out as you want.
Thirdly, consider this loop:
|
S<----
| |
A |
/ \ |
B C |
\ / |
D |
| |
E-----
|
F
where the backedge now has higher frequency than A->B and A->C.
With simple this ends up as:
S: ...; goto A
D: ...
E: ...; if (not again) goto F
S: ...
A: ...; if () goto C
B: ...; goto D
C: ...; goto D
and this is cheaper than the alternative:
S: ...
A: ...; if () goto C
B: ...
D: ...
E: ...; if (again) goto A
F:
C: ...; goto D
If a fraction f of the loop iterations does C (so 1-f does B), the
alternative has 1+2f jumps per iteration; the code simple makes has 1+f.
> > I think we've probably discussed this more than is really necessary. We
> > just need to pick an alternative and go with it, I think either alternative
> > is reasonable (avoid copying when STC has no length information or fall back
> > to simple when there is no length information).
>
> The description of both algorithms doesn't make it an obvious pick. So lets
> leave the decision of the algorithm to the target and make STC beave sane
> with no length information (whether that means disallowing any copying or
> substituting a "default" length is another question -- but obviously it's the
> targets fault to not provide lenght information).
I agree.
Segher