On 03/09/15 03:42, Richard Biener wrote:
On Fri, 6 Mar 2015, Jeff Law wrote:

On 03/06/15 06:16, Richard Biener wrote:

This fixes PR63155 and reduces the memory usage at -O0 from reported
10GB (couldn't verify/update on my small box) to 350MB (still worse
compared to 4.8 which needs only 50MB).

It fixes this by no longer computing live info or building a conflict
graph for coalescing of SSA names flowing over abnormal edges
(which needs to succeed).

Of course this also removes verification that this coalescing is valid
(but computing this has quadratic cost).  With this it turns
ICEs into miscompiles.

We could restrict verifying that we can perform abnormal coalescing
to ENABLE_CHECKING (and I've wanted a verifier pass that can verify
this multiple times to be able to catch what breaks it and not having
to work back from out-of-SSA ICEing...).

So any opinion on this patch welcome.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk? ;)

Thanks,
Richard.

2015-03-06  Richard Biener  <rguent...@suse.de>

        PR middle-end/63155
        * tree-ssa-coalesce.c (attempt_coalesce): Handle graph being NULL.
        (coalesce_partitions): Split out abnormal coalescing to ...
        (perform_abnormal_coalescing): ... this function.
        (coalesce_ssa_name): Perform abnormal coalescing without computing
        live/conflict.
I'd personally like to keep the checking when ENABLE_CHECKING.

I haven't followed this discussion real closely, but I wonder if some kind of
blocking approach would work without blowing up the memory consumption.
There's no inherent reason why we have to coalesce everything at the same
time.  We can use a blocking factor and do coalescing on some N number of
SSA_NAMEs at a time.

Yes, that's possible at (quite?) some compile-time cost.  Note that we
can't really guarantee that the resulting live/conflict problems shrink
significantly enough without sorting the coalesces in a different way
(not after important coalesces but after their basevars).
Yea, it's a class time/space tradeoff. I guess it comes down to how much compile-time pain we'll take for reducing memory usage.

It may also be the case that some blocking factors are actually faster than doing everything at once, even for more common input graph sizes.

I actually ran into this when looking at the liveness computations for into-ssa eons ago. We were computing liveness in parallel, but a blocking of 1 object is actually best:

https://gcc.gnu.org/ml/gcc-patches/2003-10/msg01301.html



Jeff

Reply via email to