Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

Richard Guenther Tue, 07 Aug 2012 02:22:48 -0700

On Tue, Aug 7, 2012 at 11:03 AM, Steven Bosscher <stevenb....@gmail.com> wrote:
> On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther
> <richard.guent...@gmail.com> wrote:
>> On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher <stevenb....@gmail.com> 
>> wrote:
>>> Hello,
>>>
>>> In the test case for PR54146, compute_global_livein allocates/frees a
>>> worklist for >400,000 basic blocks on each invocation. And it's called
>>> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of
>>> basic blocks ever on the work list was only ~6500. So the work list
>>> can be much smaller most of the time.
>>>
>>> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?
>>
>> Index: tree-ssa-loop-manip.c
>> ===================================================================
>> --- tree-ssa-loop-manip.c       (revision 190176)
>> +++ tree-ssa-loop-manip.c       (working copy)
>> @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live
>>    basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
>>    bitmap_iterator bi;
>>
>> -  if (is_gimple_reg (var))
>> -    bitmap_clear_bit (livein, def_bb->index);
>> -  else
>> -    bitmap_set_bit (livein, def_bb->index);
>> +  gcc_checking_assert (is_gimple_reg (var));
>> +  bitmap_clear_bit (livein, def_bb->index);
>>
>> !is_gimple_reg is true for virtual operand PHIs ... and at least
>> find_uses_to_rename_stmt loops over all uses (so, I suppose given the
>> comments elsewhere about tracking vops can you fix that to
>> s/SSA_OP_ALL_USES/SSA_OP_USE/?
>
> I'll give that a try.
>
>
>>  void
>> -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap
>> def_blocks ATTRIBUTE_UNUSED)
>> +compute_global_livein (bitmap livein, bitmap def_blocks)
>>  {
>> -  basic_block bb, *worklist, *tos;
>>    unsigned i;
>>    bitmap_iterator bi;
>> +  VEC (basic_block, heap) *worklist;
>>
>> -  tos = worklist
>> -    = (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + 
>> 1));
>> +  /* Normally the work list size is bounded by the number of basic
>> +     blocks in the largest loop.  We don't know this number, but we
>> +     can be fairly sure that it will be relatively small.  */
>> +  worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100));
>>
>> I suppose if we really want to optimize this we can make worklist static
>> so we at most grow once.  (in case you want to optimize allocation overhead,
>> not memory used).  Other structures in the ssa rewriter have this kind of
>> lifetime, too (the per-SSA name info for example).
>
> I don't want this to be static. This shouldn't be a persistent piece of data.
> But I actually did try a static work list (the full list, not the
> VEC), and it made no difference at all on the timings.


Ah, so the patch only reduces allocation but not compile-time itself.

> All time is spent in the loop in compute_global_livein itself. There
> are >400,000 basic blocks but the maximum loop depth is only 3. I've
> tried out livein as a sparseset last night (allocated and filled in
> add_exit_phis_var and re-used for every call to add_exit_phis_edge),
> that reduces the time spent in compute_global_livein by a factor 2.5
> for this test case (i.e. not the order-of-magnitude change I'm looking
> for).

Another optimization would be to do

@@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT
              && ! bitmap_bit_p (def_blocks, pred_index)
              && bitmap_set_bit (livein, pred_index))
            {
-             *tos++ = pred;
+             VEC_safe_push (basic_block, heap, worklist, pred);

thus combine the bitmap_set_bit and bitmap_bit_p tests on livein.

> Why can't the normal SSA updater be used to rewrite into loop-closed SSA form?

It does not have a mode to trigger PHI inserts for loop-closed SSA
form.  I suppose
most of the time we should try harder and not call rewrite_into_loop_closed_ssa
so many times (eventually for the easy cases where we _do_ have an old->new
name mapping or symbols to rename the SSA updater should deal with this).

Richard.

> Ciao!
> Steven

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

Reply via email to