On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagar...@linux.ibm.com> wrote: > > Hello Richard: > > On 03/11/23 12:51 pm, Richard Biener wrote: > > On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagar...@linux.ibm.com> wrote: > >> > >> Hello All: > >> [...] > >> > >> High register pressure region is the region where there are live-in of > >> early blocks that has been modified by the early block. If there are > >> modification of the variables in best block that are live-in in early > >> block that are live-out of best block. > > > > ?! Parse error. > > > > I didnt understand what you meant here. Please suggest.
I can't even guess what that paragraph means. It fails at a parsing level already, I can't even start to reason about what the sentences mean. > >> Bootstrapped and regtested on powerpc64-linux-gnu. > > > > What's the effect on code generation? > > > > Note that live is a quadratic problem while sinking was not. You > > are effectively making the pass unfit for -O1. > > > > You are computing "liveness" on GIMPLE where within EBBs there > > isn't really any particular order of stmts, so it's kind of a garbage > > heuristic. Likewise you are not computing the effect that moving > > a stmt has on liveness as far as I can see but you are just identifying > > some odd metrics (I don't really understand them) to rank blocks, > > not even taking the register file size into account. > > > if the live out of best_bb <= live out of early_bb, that shows > that there are modification in best_bb. Hm? Do you maybe want to say that if live_out (bb) < live_in (bb) then some variables die during the execution of bb? Otherwise, if live_out (early) > live_out (best) then somewhere on the path from early to best some variables die. > Then it's > safer to move statements in best_bb as there are lesser interfering > live variables in best_bb. consider a stmt a = b + c; where b and c die at the definition of a. Then moving the stmt down from early_bb means you increase live_out (early_bb) by one. So why's that "safer" then? Of course live_out (best_bb) also increases by two then. > if there are lesser live out in best_bb, there is lesser chance > of interfering live ranges and hence moving statements in best_bb > will not increase register pressure. > > If the liveout of best_bb is greater than live-out of early_bb, > moving statements in best_bb will increase chances of more interfering > live ranges and hence increase in register pressure. > > This is how the heuristics is defined. I don't think they will work. Do you have any numbers? > > > > > You are replacing the hot/cold heuristic. > > > > > IMHO the sinking pass is the totally wrong place to do anything > > about register pressure. You are trying to solve a scheduling > > problem by just looking at a single stmt. > > > > bb->count from profile.cc are prone to errors as you have > mentioned in previous mails. Main bottlenecks with code > motion is increase in register pressure as that counts to > spills in later phases of the compiler backend. > > Calculation of best_bb based of immediate dominator should > consider register pressure instead of hold cold regions as that > would effect code generation. > > If there is increase in register pressure with code motion and if > we are moving into colder regions, that wont improve code generations. > > Hold/cold should be criteria but not the improved criteria with > code motion. > > We should consider register pressure with code motion than hot/cold > regions. > > Thanks & Regards > Ajit > > > Richard. > > > >> Thanks & Regards > > > >> Ajit > >> > >> tree-optimization: Add register pressure heuristics > >> > >> Currently code sinking heuristics are based on profile data like > >> basic block count and sink frequency threshold. We have removed > >> such heuristics to add register pressure heuristics based on > >> live-in and live-out of early blocks and immediate dominator of > >> use blocks. > >> > >> High register pressure region is the region where there are live-in of > >> early blocks that has been modified by the early block. If there are > >> modification of the variables in best block that are live-in in early > >> block that are live-out of best block. > >> > >> 2023-11-03 Ajit Kumar Agarwal <aagar...@linux.ibm.com> > >> > >> gcc/ChangeLog: > >> > >> * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p > >> as paramters. > >> (sink_code_in_bb): Ditto. > >> (select_best_block): Add register pressure heuristics to select > >> the best blocks in the immediate dominator for same loop nest > >> depth. > >> (execute): Add live range analysis. > >> (additional_var_map): New function. > >> * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand > >> tests on ssa_names. > >> (verify_live_on_entry): Ditto. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/tree-ssa/ssa-sink-21.c: New test. > >> * gcc.dg/tree-ssa/ssa-sink-22.c: New test. > >> --- > >> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++ > >> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++ > >> gcc/tree-ssa-live.cc | 11 ++- > >> gcc/tree-ssa-sink.cc | 93 ++++++++++++++------- > >> 4 files changed, 104 insertions(+), 34 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c > >> > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c > >> new file mode 100644 > >> index 00000000000..d3b79ca5803 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c > >> @@ -0,0 +1,15 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ > >> +void bar(); > >> +int j; > >> +void foo(int a, int b, int c, int d, int e, int f) > >> +{ > >> + int l; > >> + l = a + b + c + d +e + f; > >> + if (a != 5) > >> + { > >> + bar(); > >> + j = l; > >> + } > >> +} > >> +/* { dg-final { scan-tree-dump > >> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */ > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c > >> new file mode 100644 > >> index 00000000000..84e7938c54f > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c > >> @@ -0,0 +1,19 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ > >> +void bar(); > >> +int j, x; > >> +void foo(int a, int b, int c, int d, int e, int f) > >> +{ > >> + int l; > >> + l = a + b + c + d +e + f; > >> + if (a != 5) > >> + { > >> + bar(); > >> + if (b != 3) > >> + x = 3; > >> + else > >> + x = 5; > >> + j = l; > >> + } > >> +} > >> +/* { dg-final { scan-tree-dump > >> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */ > >> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc > >> index f06daf23035..998fe588278 100644 > >> --- a/gcc/tree-ssa-live.cc > >> +++ b/gcc/tree-ssa-live.cc > >> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, > >> tree_live_info_p live) > >> def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); > >> > >> /* An undefined local variable does not need to be very alive. */ > >> - if (ssa_undefined_value_p (ssa_name, false)) > >> + if (virtual_operand_p (ssa_name) > >> + || ssa_undefined_value_p (ssa_name, false)) > >> return; > >> > >> /* Visit each use of SSA_NAME and if it isn't in the same block as the > >> def, > >> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr) > >> > >> > >> /* Verify that the info in LIVE matches the current cfg. */ > >> - > >> static void > >> verify_live_on_entry (tree_live_info_p live) > >> { > >> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live) > >> tree d = NULL_TREE; > >> bitmap loe; > >> var = partition_to_var (map, i); > >> + if (var == NULL_TREE) > >> + continue; > >> stmt = SSA_NAME_DEF_STMT (var); > >> tmp = gimple_bb (stmt); > >> + > >> if (SSA_NAME_VAR (var)) > >> d = ssa_default_def (cfun, SSA_NAME_VAR (var)); > >> - > >> loe = live_on_entry (live, e->dest); > >> if (loe && bitmap_bit_p (loe, i)) > >> { > >> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live) > >> { > >> /* An undefined local variable does not need to be very > >> alive. */ > >> - if (ssa_undefined_value_p (var, false)) > >> + if (virtual_operand_p (var) > >> + || ssa_undefined_value_p (var, false)) > >> continue; > >> > >> /* The only way this var shouldn't be marked live on entry > >> is > >> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc > >> index a360c5cdd6e..d0c9ef1ab86 100644 > >> --- a/gcc/tree-ssa-sink.cc > >> +++ b/gcc/tree-ssa-sink.cc > >> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, > >> bool *debug_stmts) > >> tree, return the best basic block between them (inclusive) to place > >> statements. > >> > >> + The best basic block should be an immediate dominator of > >> + best basic block if we've moved to same loop nest. > >> + > >> We want the most control dependent block in the shallowest loop nest. > >> > >> If the resulting block is in a shallower loop nest, then use it. Else > >> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p > >> def_p, bool *debug_stmts) > >> static basic_block > >> select_best_block (basic_block early_bb, > >> basic_block late_bb, > >> - gimple *stmt) > >> + gimple *stmt, > >> + tree_live_info_p &live) > >> { > >> basic_block best_bb = late_bb; > >> basic_block temp_bb = late_bb; > >> - int threshold; > >> > >> while (temp_bb != early_bb) > >> { > >> /* If we've moved into a lower loop nest, then that becomes > >> our best block. */ > >> - if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb)) > >> + if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb)) > >> best_bb = temp_bb; > >> > >> /* Walk up the dominator tree, hopefully we'll find a shallower > >> loop nest. */ > >> temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); > >> } > >> - > >> /* Placing a statement before a setjmp-like function would be invalid > >> (it cannot be reevaluated when execution follows an abnormal edge). > >> If we selected a block with abnormal predecessors, just punt. */ > >> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb, > >> && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, > >> best_bb)) > >> return early_bb; > >> > >> - /* Get the sinking threshold. If the statement to be moved has memory > >> - operands, then increase the threshold by 7% as those are even more > >> - profitable to avoid, clamping at 100%. */ > >> - threshold = param_sink_frequency_threshold; > >> - if (gimple_vuse (stmt) || gimple_vdef (stmt)) > >> - { > >> - threshold += 7; > >> - if (threshold > 100) > >> - threshold = 100; > >> - } > >> + int best_bb_liveout_cnt > >> + = bitmap_count_bits (&live->liveout[best_bb->index]); > >> + int early_bb_liveout_cnt > >> + = bitmap_count_bits (&live->liveout[early_bb->index]); > >> + int early_livein_cnt > >> + = bitmap_count_bits (&live->livein[early_bb->index]); > >> + > >> + /* High register pressure region is the region where there are live-in > >> of > >> + early blocks that has been modified by the early block. If there are > >> + modification of the variables in best block that are live-in in early > >> + block that are live-out of best block. */ > >> + bool live_in_rgn = (early_livein_cnt != 0 > >> + && early_bb_liveout_cnt <= early_livein_cnt); > >> + > >> + bool high_reg_pressure_rgn > >> + = ((best_bb_liveout_cnt != 0 || live_in_rgn) > >> + && best_bb_liveout_cnt <= early_bb_liveout_cnt); > >> > >> /* If BEST_BB is at the same nesting level, then require it to have > >> - significantly lower execution frequency to avoid gratuitous > >> movement. */ > >> + significantly high register pressure region to avoid gratuitous > >> + movement. */ > >> if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb) > >> - /* If result of comparsion is unknown, prefer EARLY_BB. > >> - Thus use !(...>=..) rather than (...<...) */ > >> - && !(best_bb->count * 100 >= early_bb->count * threshold)) > >> - return best_bb; > >> + && high_reg_pressure_rgn) > >> + { > >> + /* Avoid sinking to immediate dominator if the statement to be moved > >> + has memory operand and same loop nest. */ > >> + if (best_bb != late_bb && gimple_vuse (stmt)) > >> + return late_bb; > >> + return best_bb; > >> + } > >> > >> /* No better block found, so return EARLY_BB, which happens to be the > >> statement's original block. */ > >> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb, > >> static bool > >> statement_sink_location (gimple *stmt, basic_block frombb, > >> gimple_stmt_iterator *togsi, bool *zero_uses_p, > >> - virtual_operand_live &vop_live) > >> + virtual_operand_live &vop_live, > >> + tree_live_info_p &live) > >> { > >> gimple *use; > >> use_operand_p one_use = NULL_USE_OPERAND_P; > >> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block > >> frombb, > >> if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb)) > >> return false; > >> > >> - commondom = select_best_block (frombb, commondom, stmt); > >> + commondom = select_best_block (frombb, commondom, stmt, live); > >> > >> if (commondom == frombb) > >> return false; > >> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block > >> frombb, > >> continue; > >> break; > >> } > >> + > >> use = USE_STMT (one_use); > >> > >> if (gimple_code (use) != GIMPLE_PHI) > >> { > >> - sinkbb = select_best_block (frombb, gimple_bb (use), stmt); > >> + sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live); > >> > >> if (sinkbb == frombb) > >> return false; > >> > >> - if (sinkbb == gimple_bb (use)) > >> - *togsi = gsi_for_stmt (use); > >> - else > >> - *togsi = gsi_after_labels (sinkbb); > >> + *togsi = gsi_after_labels (sinkbb); > >> > >> return true; > >> } > >> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block > >> frombb, > >> if (!sinkbb) > >> return false; > >> > >> - sinkbb = select_best_block (frombb, sinkbb, stmt); > >> + sinkbb = select_best_block (frombb, sinkbb, stmt, live); > >> if (!sinkbb || sinkbb == frombb) > >> return false; > >> > >> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb) > >> /* Perform code sinking on BB */ > >> > >> static unsigned > >> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live) > >> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live, > >> + tree_live_info_p &live) > >> { > >> gimple_stmt_iterator gsi; > >> edge_iterator ei; > >> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live > >> &vop_live) > >> gimple_stmt_iterator togsi; > >> bool zero_uses_p; > >> > >> - if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, > >> vop_live)) > >> + if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, > >> + vop_live, live)) > >> { > >> gimple_stmt_iterator saved = gsi; > >> if (!gsi_end_p (gsi)) > >> @@ -814,6 +829,15 @@ private: > >> bool unsplit_edges; > >> }; // class pass_sink_code > >> > >> +void additional_var_map (var_map &map) > >> +{ > >> + map->bmp_bbs = BITMAP_ALLOC (NULL); > >> + map->outofssa_p = false; > >> + basic_block bb; > >> + FOR_EACH_BB_FN (bb, cfun) > >> + bitmap_set_bit (map->bmp_bbs, bb->index); > >> +} > >> + > >> unsigned int > >> pass_sink_code::execute (function *fun) > >> { > >> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun) > >> calculate_dominance_info (CDI_DOMINATORS); > >> > >> virtual_operand_live vop_live; > >> + var_map map = init_var_map (num_ssa_names); > >> + additional_var_map (map); > >> + tree_live_info_p live = calculate_live_ranges (map, true); > >> > >> int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); > >> int n = inverted_rev_post_order_compute (fun, rpo); > >> + > >> for (int i = 0; i < n; ++i) > >> - todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live); > >> + todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, > >> live); > >> free (rpo); > >> + if (live) > >> + delete_tree_live_info (live); > >> + > >> + if (map) > >> + delete_var_map (map); > >> > >> statistics_counter_event (fun, "Sunk statements", sink_stats.sunk); > >> statistics_counter_event (fun, "Commoned stores", sink_stats.commoned); > >> -- > >> 2.39.3 > >> > >>