Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

Steven Bosscher Tue, 07 Aug 2012 02:04:06 -0700

On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther
<richard.guent...@gmail.com> wrote:
> On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher <stevenb....@gmail.com> wrote:
>> Hello,
>>
>> In the test case for PR54146, compute_global_livein allocates/frees a
>> worklist for >400,000 basic blocks on each invocation. And it's called
>> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of
>> basic blocks ever on the work list was only ~6500. So the work list
>> can be much smaller most of the time.
>>
>> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?
>
> Index: tree-ssa-loop-manip.c
> ===================================================================
> --- tree-ssa-loop-manip.c       (revision 190176)
> +++ tree-ssa-loop-manip.c       (working copy)
> @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live
>    basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
>    bitmap_iterator bi;
>
> -  if (is_gimple_reg (var))
> -    bitmap_clear_bit (livein, def_bb->index);
> -  else
> -    bitmap_set_bit (livein, def_bb->index);
> +  gcc_checking_assert (is_gimple_reg (var));
> +  bitmap_clear_bit (livein, def_bb->index);
>
> !is_gimple_reg is true for virtual operand PHIs ... and at least
> find_uses_to_rename_stmt loops over all uses (so, I suppose given the
> comments elsewhere about tracking vops can you fix that to
> s/SSA_OP_ALL_USES/SSA_OP_USE/?


I'll give that a try.


>  void
> -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap
> def_blocks ATTRIBUTE_UNUSED)
> +compute_global_livein (bitmap livein, bitmap def_blocks)
>  {
> -  basic_block bb, *worklist, *tos;
>    unsigned i;
>    bitmap_iterator bi;
> +  VEC (basic_block, heap) *worklist;
>
> -  tos = worklist
> -    = (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + 
> 1));
> +  /* Normally the work list size is bounded by the number of basic
> +     blocks in the largest loop.  We don't know this number, but we
> +     can be fairly sure that it will be relatively small.  */
> +  worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100));
>
> I suppose if we really want to optimize this we can make worklist static
> so we at most grow once.  (in case you want to optimize allocation overhead,
> not memory used).  Other structures in the ssa rewriter have this kind of
> lifetime, too (the per-SSA name info for example).

I don't want this to be static. This shouldn't be a persistent piece of data.
But I actually did try a static work list (the full list, not the
VEC), and it made no difference at all on the timings.

All time is spent in the loop in compute_global_livein itself. There
are >400,000 basic blocks but the maximum loop depth is only 3. I've
tried out livein as a sparseset last night (allocated and filled in
add_exit_phis_var and re-used for every call to add_exit_phis_edge),
that reduces the time spent in compute_global_livein by a factor 2.5
for this test case (i.e. not the order-of-magnitude change I'm looking
for).

Why can't the normal SSA updater be used to rewrite into loop-closed SSA form?

Ciao!
Steven

Re: [patch] Use a smaller, dynamic worklist in compute_global_livein

Reply via email to