Re: [PATCH][RFC] Add FRE in pass_vectorize

Jeff Law Wed, 24 Jun 2015 20:31:14 -0700

On 06/24/2015 01:59 AM, Richard Biener wrote:

Redundant, basically two IVs with the same initial value and same step.
IVOPTs can deal with this if the initial values and the step are already
same "enough" - the vectorizer can end up generating redundant huge
expressions for both.

Ah, so yes, this is a totally different issue than Alan and I arediscussing.

RTL CSE is bloody expensive and so many times I wanted the ability to know a
bit about what the loop optimizer had done (or not done) so that I could
conditionally skip the second CSE pass.   We never built that, but it's
something I've wanted for decades.


Hmm, ok.  We can abuse pass properties for this but I don't think
they are a scalable fit.  Not sure if we'd like to go full way
adding sth like PROP_want_ccp PROP_want_copyprop PROP_want_cse, etc.
(any others?).  And whether FRE would then catch a PROP_want_copyprop
because it also can do copy propagation.

And that's why we haven't pushed hard on this issue -- it doesn't scaleand to make it scale requires rethinking the basics of the pass manager.

Going a bit further here, esp. in the loop context, would be to
have the basic cleanups be region-based.  Because given a big
function with many loops and just one vectorized it would be
enough to cleanup the vectorized loop (yes, and in theory
all downstream effects, but that's probably secondary and not
so important).  It's not too difficult to make FRE run on
a MEME region, the interesting part, engineering-wise, is to
really make it O(size of MEME region) - that is, eliminate
things like O(num_ssa_names) or O(n_basic_blocks) setup cost.

I had a long talk with some of the SGI compiler guys many years agoabout region-based optimizations. It was something they had been tryingto bring into their compiler for years, but never got it working to apoint where they were happy with it. While they didn't show me thecode, they indicated the changes were highly invasive -- and all thecode had been #ifdef'd out because it just didn't work. Naturally itwas all bitrotting.


And then there is the possibility of making passes generate less
needs to perform cleanups after them - like in the present case
with the redundant IVs make them more appearant redundant by
CSEing the initial value and step during vectorizer code generation.
I'm playing with the idea of adding a simple CSE machinery to
the gimple_build () interface (aka match-and-simplify).  It
eventually invokes (well, not currently, but that can be fixed)
maybe_push_res_to_seq which is a good place to maintain a
table of already generated expressions.  That of course only
works if you either always append to the same sequence or at least
insert at the same place.

As you know we've gone back and forth on this in the past. It's alwaysa trade-off. I still ponder from time to time putting the simple CSEand cprop bits back into the SSA rewriting phase to avoid generating allkinds of garbage that just needs to be cleaned up later -- particularlyfor incremental SSA updates.




Jeff

Re: [PATCH][RFC] Add FRE in pass_vectorize

Reply via email to