On Tue, Aug 7, 2012 at 11:03 AM, Steven Bosscher <stevenb....@gmail.com> wrote: > On Tue, Aug 7, 2012 at 10:31 AM, Richard Guenther > <richard.guent...@gmail.com> wrote: >> On Tue, Aug 7, 2012 at 8:24 AM, Steven Bosscher <stevenb....@gmail.com> >> wrote: >>> Hello, >>> >>> In the test case for PR54146, compute_global_livein allocates/frees a >>> worklist for >400,000 basic blocks on each invocation. And it's called >>> a lot, for rewrite_into_loop_closed_ssa. But the maximum number of >>> basic blocks ever on the work list was only ~6500. So the work list >>> can be much smaller most of the time. >>> >>> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk? >> >> Index: tree-ssa-loop-manip.c >> =================================================================== >> --- tree-ssa-loop-manip.c (revision 190176) >> +++ tree-ssa-loop-manip.c (working copy) >> @@ -162,10 +162,8 @@ add_exit_phis_var (tree var, bitmap live >> basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); >> bitmap_iterator bi; >> >> - if (is_gimple_reg (var)) >> - bitmap_clear_bit (livein, def_bb->index); >> - else >> - bitmap_set_bit (livein, def_bb->index); >> + gcc_checking_assert (is_gimple_reg (var)); >> + bitmap_clear_bit (livein, def_bb->index); >> >> !is_gimple_reg is true for virtual operand PHIs ... and at least >> find_uses_to_rename_stmt loops over all uses (so, I suppose given the >> comments elsewhere about tracking vops can you fix that to >> s/SSA_OP_ALL_USES/SSA_OP_USE/? > > I'll give that a try. > > >> void >> -compute_global_livein (bitmap livein ATTRIBUTE_UNUSED, bitmap >> def_blocks ATTRIBUTE_UNUSED) >> +compute_global_livein (bitmap livein, bitmap def_blocks) >> { >> - basic_block bb, *worklist, *tos; >> unsigned i; >> bitmap_iterator bi; >> + VEC (basic_block, heap) *worklist; >> >> - tos = worklist >> - = (basic_block *) xmalloc (sizeof (basic_block) * (last_basic_block + >> 1)); >> + /* Normally the work list size is bounded by the number of basic >> + blocks in the largest loop. We don't know this number, but we >> + can be fairly sure that it will be relatively small. */ >> + worklist = VEC_alloc (basic_block, heap, MAX (8, n_basic_blocks / 100)); >> >> I suppose if we really want to optimize this we can make worklist static >> so we at most grow once. (in case you want to optimize allocation overhead, >> not memory used). Other structures in the ssa rewriter have this kind of >> lifetime, too (the per-SSA name info for example). > > I don't want this to be static. This shouldn't be a persistent piece of data. > But I actually did try a static work list (the full list, not the > VEC), and it made no difference at all on the timings.
Ah, so the patch only reduces allocation but not compile-time itself. > All time is spent in the loop in compute_global_livein itself. There > are >400,000 basic blocks but the maximum loop depth is only 3. I've > tried out livein as a sparseset last night (allocated and filled in > add_exit_phis_var and re-used for every call to add_exit_phis_edge), > that reduces the time spent in compute_global_livein by a factor 2.5 > for this test case (i.e. not the order-of-magnitude change I'm looking > for). Another optimization would be to do @@ -440,13 +442,13 @@ compute_global_livein (bitmap livein ATT && ! bitmap_bit_p (def_blocks, pred_index) && bitmap_set_bit (livein, pred_index)) { - *tos++ = pred; + VEC_safe_push (basic_block, heap, worklist, pred); thus combine the bitmap_set_bit and bitmap_bit_p tests on livein. > Why can't the normal SSA updater be used to rewrite into loop-closed SSA form? It does not have a mode to trigger PHI inserts for loop-closed SSA form. I suppose most of the time we should try harder and not call rewrite_into_loop_closed_ssa so many times (eventually for the easy cases where we _do_ have an old->new name mapping or symbols to rename the SSA updater should deal with this). Richard. > Ciao! > Steven