> > I added cunrolle pass that differ from cunrolli by not allowing code size
> > growth even at -O3 (because we do not know what loops are hot yet).
> > We currently unroll tiny loop with 2 calls that I think needs to be tammed
> > down, I can do that if the patch seems to make sense.
> 
> Please.

OK, will do that.
> 
> > I tried several options and ended up adding cunrolle before FRE and 
> > reordering
> > FRE and SRA: SRA needs constant propagation to happen after unrolling to 
> > work
> > and I think value numbering does work pretty well on non-SRAed 
> > datastructures.
> > I also added DCE just before unrolling. This increases number of unrolls by
> > about 60% on both tramp3d and eon. (basically we want to have DCE and cprop
> > done to make unroller metrics go resonably well)
> 
> I've re-ordered SRA and ealias for GCC5 because ealias benefits from SRA
> while SRA doesn't need PTA.  You undo that improvement.

OK, I see. I assumed that the PTA solutions are updated when SRA introduce new 
scalars.
We could simply do limited CCP when unrolling happened lifting the need for FRE 
in between cunrolle and SRA.  I dimly remember we even have code for that?
> 
> We should improve the unroller instead of requiring DCE before it.

If I get loop with dead code in it, because of einline or gimple production or 
whatever,
what unroller should do short of  doing its own DCE pass on the whole function 
body (well the mark, ignoring sweep)?

Honza
> 
> I think that adding more passes to the early pipeline is bad.  Sure, it
> will help weird C++ code-bases.  But it will slow down the compile for
> all the rest.

I am not 100% convinced about this especially with LTO, where we need to pickle
all garbage we did not eliminated.
> 
> As we want early unrolling to get the secondary effects by performing
> better FRE/DCE/DSE I think trying to do a better job in value-numbering
> for the kind of loops we are talking about would be better.  For tramp3d
> we are talking about cases like
> 
>  for (i=0; i<3; ++i)
>   dim[i] = i;
> 
> or similar code which ends up storing to a single array.  There is
> the vn_reference_lookup_3 function in tree-ssa-sccvn.c which is
> the canonical place for value-numbering "tricks".  It "simply"
> needs to be taught how to "look through loops".
> 
> I'd like to net get into the game of huge pass order dependences in
> early-opts - that just shows our scalar cleanups are too weak.
> Ideally we'd just have a single pass in early-opts...

Other motivation for getting rid of loops is to make control&data flow more
explicit for IPA analysers.
If you have code o nvectors of size 3 that does series of operations, it is very
hard to reorder/fuse/merge loops as needed for the scalar optimizations...

Honza

Reply via email to