On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher <stevenb....@gmail.com> wrote: >> Hello, >> >> In the test case for PR54146, compute_global_livein allocates/frees a >> worklist for >400,000 basic blocks on each invocation. And it's called >> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of >> basic blocks ever on the work list was only ~6500. So the work list >> can be much smaller most of the time. >> >> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk? > > Index: tree-ssa-loop-manip.c > =================================================================== > --- tree-ssa-loop-manip.c (revision 190176) > +++ tree-ssa-loop-manip.c (working copy) > @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live > basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); > bitmap_iterator bi; > > - if (is_gimple_reg (var)) > - bitmap_clear_bit (livein, def_bb->index); > - else > - bitmap_set_bit (livein, def_bb->index); > + gcc_checking_assert (is_gimple_reg (var)); > + bitmap_clear_bit (livein, def_bb->index); > > !is_gimple_reg is true for virtual operand PHIs ... and at least > find_uses_to_rename_stmt loops over all uses (so, I suppose given the > comments elsewhere about tracking vops can you fix that to > s/SSA_OP_ALL_USES/SSA_OP_USE/?
I'll give that a try. > void > -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap > def_blocks ATTRIBUTE_UNUSED) > +compute_global_livein (bitmap livein, bitmap def_blocks) > { > - basic_block bb, *worklist, *tos; > unsigned i; > bitmap_iterator bi; > + VEC (basic_block, heap) *worklist; > > - tos = worklist > - = (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + > 1)); > + /* Normally the work list size is bounded by the number of basic > + blocks in the largest loop. We don't know this number, but we > + can be fairly sure that it will be relatively small. */ > + worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100)); > > I suppose if we really want to optimize this we can make worklist static > so we at most grow once. (in case you want to optimize allocation overhead, > not memory used). Other structures in the ssa rewriter have this kind of > lifetime, too (the per-SSA name info for example). I don't want this to be static. This shouldn't be a persistent piece of data. But I actually did try a static work list (the full list, not the VEC), and it made no difference at all on the timings. All time is spent in the loop in compute_global_livein itself. There are >400,000 basic blocks but the maximum loop depth is only 3. I've tried out livein as a sparseset last night (allocated and filled in add_exit_phis_var and re-used for every call to add_exit_phis_edge), that reduces the time spent in compute_global_livein by a factor 2.5 for this test case (i.e. not the order-of-magnitude change I'm looking for). Why can't the normal SSA updater be used to rewrite into loop-closed SSA form? Ciao! Steven