http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38518
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |stevenb.gcc at gmail dot com --- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue here is the DF machinery, or rather the fact that RTL invariant motion calls df_analyze () for each loop, thus void move_loop_invariants (void) { struct loop *loop; if (flag_ira_loop_pressure) { df_analyze (); regstat_init_n_sets_and_refs (); ira_set_pseudo_classes (true, dump_file); calculate_loop_reg_pressure (); regstat_free_n_sets_and_refs (); } df_set_flags (DF_EQ_NOTES + DF_DEFER_INSN_RESCAN); /* Process the loops, innermost first. */ FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { curr_loop = loop; /* move_single_loop_invariants for very large loops is time consuming and might need a lot of memory. */ if (loop->num_nodes <= (unsigned) LOOP_INVARIANT_MAX_BBS_IN_LOOP) move_single_loop_invariants (loop); } here move_single_loop_invariants -> find_invariants -> find_defs which does static void find_defs (struct loop *loop, basic_block *body) { unsigned i; bitmap blocks = BITMAP_ALLOC (NULL); for (i = 0; i < loop->num_nodes; i++) bitmap_set_bit (blocks, body[i]->index); if (dump_file) { fprintf (dump_file, "*****starting processing of loop %d ******\n", loop->num); } df_remove_problem (df_chain); df_process_deferred_rescans (); df_chain_add_problem (DF_UD_CHAIN); df_set_blocks (blocks); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_analyze (); ... which is excessive. It looks like the above DF stuff does not affects the whole function but only the BBs in the loop. Still the fact that we iterate from inner to outer loops still makes the DF analysis time quadratic in the loop depth. The testcase has only loops of depth 1 (but 2164 ones), so it seems that the DF restriction to a set of BBs does not work as expected? >From the profile I see that the most time spent in df_analyze is inverted_post_order_compute and post_order_compute (obviosly not region aware!) and then bitmap_set_bit and df_prune_to_subcfg. I wonder if it isn't possible to either update the DF during the transform, deal with partly out-of-date DF info (like process loop siblings without re-calculating DF, and re-compute whole FN DF after such iteration).