Re: [RFA][PATCH][PR tree-optimization/64058] Improve and stabilize sorting of coalesce pairs

Richard Biener Tue, 15 Mar 2016 07:24:04 -0700

On Mon, Mar 14, 2016 at 11:32 PM, Jeff Law <l...@redhat.com> wrote:
> On 03/11/2016 03:02 AM, Richard Biener wrote:
>>
>>
>>
>> For the other part I noticed a few things
>>   1) having a bitmap_count_ior_bits () would be an improvement
>
> Yea, I almost built one.  That's easy enough to add.
>
>>   2) you might end up with redundant work(?) as you are iterating
>>   over SSA name coalesce candidates but look at partition conflicts
>
> We'd have redundant work if the elements mapped back to SSA_NAMEs which in
> turn mapped to partitions which appeared as a coalescing pair already.  But
> there's no way to know that in advance.
>
> This is mitigated somewhat in the next version which computes the conflict
> sizes lazily when the qsort comparison function is given two conflict pairs
> with an equal cost.


That sounds good.

>>   3) having this extra heuristic might be best guarded by
>> flag_expensive_optimizations
>
> Perhaps.  I don't see this tie breaker as being all that expensive.  But I
> won't object to guarding with flag_expensive_optimizations.

Yeah, we should first address the quadraticness of the live compute which is
usually what hits us first when hitting a bottleneck in coalescing / out-of-SSA.

>>   as it is a quite expensive "tie breaker" - maybe even improve things
>> by first sorting
>>   after cost and then only doing the tie breaking when necessary,
>> re-sorting the
>>   sub-sequence with same original cost.  It may also be enough to only
>> perform
>>   this for "important" candidates, say within the first 100 of the
>> function or so
>>   or with cost > X.
>
> The problem with this is qsort's interface into the comparison function has
> a terribly narrow API and I don't think we want to rely on qsort_r.  In fact
> that's the whole reason why I didn't do lazy evaluation on the conflict
> sizes initially.
>
> To work around the narrow API in the comparison function we have to either
> store additional data in each node or have them available in globals.  The
> former would be horribly wasteful, the latter is just ugly.  I choose the
> latter in the lazy evaluation of the conflicts version.

Works for me.

>>
>> And finally - if we really think that looking at the conflict size
>> increase is the way to go
>> it would maybe be better to use a fibheap updating keys in
>> attempt_coalesce
>> when we merge the conflicts.  That would also mean to work on a list
>> (fibheap)
>> of coalesces of partitions rather than SSA names.
>
> I really doubt it's worth this effort.  The literature I've been looking at
> in this space essentially says that given a reasonable coalescer,
> improvements, while measurable, are very very small in terms of the
> efficiency of the final code.
>
> Thus I rejected conservative coalescing + iteration, biased coalescing, &
> trial coalescing as too expensive given the trivial benefit. Similarly I
> rejected trying to update the costs as we coalesce partitions.  A single
> successful coalesce could have a significant ripple effect.  Perhaps that
> could be mitigated by realizing that many updates wouldn't be needed, but
> it's just a level of complexity that's not needed here.

Ok.

> And note we work on partitions, not SSA_NAMEs.  It just happens that we
> start with each SSA_NAME in its own partition.  Most SSA_NAMEs actually
> don't participate in coalescing as they're not used in a copy instruction or
> as a phi arg/result.   That's why we compact the partitions after we've
> scanned the IL for names that are going to participate in coalescing.
>
>
>
>
>
>>
>> I think the patch is reasonable enough for GCC 6 if we can bring
>> compile-time
>> cost down a bit (it can be quadratic in the number of SSA names if we have
>> a lot of coalesce candidates and nearly full conflict bitmaps - of course
>> that's
>> not a case we handle very well right now but still).  I would have hoped
>> the
>> index part of the patch fixed the regression (by luck)...
>
> I'd hoped it'd fix the regression by luck as well, but that was not the case
> :(
>
>
>>
>> As far as a testcase goes we want to scan the dumps for the actual
>> coalesces
>> being done.  Might be a bit fragile though...
>
> I suspect that's going to be quite fragile and may have more target
> dependencies than we'd like (due to branch costing and such).

Yes.

Otherwise -ENOPATCH.

Thanks,
Richard.

>
>
> Jeff

Re: [RFA][PATCH][PR tree-optimization/64058] Improve and stabilize sorting of coalesce pairs

Reply via email to