On 06/24/2015 01:59 AM, Richard Biener wrote:
Redundant, basically two IVs with the same initial value and same step.
IVOPTs can deal with this if the initial values and the step are already
same "enough" - the vectorizer can end up generating redundant huge
expressions for both.
Ah, so yes, this is a totally different issue than Alan and I are discussing.

RTL CSE is bloody expensive and so many times I wanted the ability to know a
bit about what the loop optimizer had done (or not done) so that I could
conditionally skip the second CSE pass.   We never built that, but it's
something I've wanted for decades.

Hmm, ok.  We can abuse pass properties for this but I don't think
they are a scalable fit.  Not sure if we'd like to go full way
adding sth like PROP_want_ccp PROP_want_copyprop PROP_want_cse, etc.
(any others?).  And whether FRE would then catch a PROP_want_copyprop
because it also can do copy propagation.
And that's why we haven't pushed hard on this issue -- it doesn't scale and to make it scale requires rethinking the basics of the pass manager.


Going a bit further here, esp. in the loop context, would be to
have the basic cleanups be region-based.  Because given a big
function with many loops and just one vectorized it would be
enough to cleanup the vectorized loop (yes, and in theory
all downstream effects, but that's probably secondary and not
so important).  It's not too difficult to make FRE run on
a MEME region, the interesting part, engineering-wise, is to
really make it O(size of MEME region) - that is, eliminate
things like O(num_ssa_names) or O(n_basic_blocks) setup cost.
I had a long talk with some of the SGI compiler guys many years ago about region-based optimizations. It was something they had been trying to bring into their compiler for years, but never got it working to a point where they were happy with it. While they didn't show me the code, they indicated the changes were highly invasive -- and all the code had been #ifdef'd out because it just didn't work. Naturally it was all bitrotting.







And then there is the possibility of making passes generate less
needs to perform cleanups after them - like in the present case
with the redundant IVs make them more appearant redundant by
CSEing the initial value and step during vectorizer code generation.
I'm playing with the idea of adding a simple CSE machinery to
the gimple_build () interface (aka match-and-simplify).  It
eventually invokes (well, not currently, but that can be fixed)
maybe_push_res_to_seq which is a good place to maintain a
table of already generated expressions.  That of course only
works if you either always append to the same sequence or at least
insert at the same place.
As you know we've gone back and forth on this in the past. It's always a trade-off. I still ponder from time to time putting the simple CSE and cprop bits back into the SSA rewriting phase to avoid generating all kinds of garbage that just needs to be cleaned up later -- particularly for incremental SSA updates.



Jeff

Reply via email to