The following fixes PR77283, path splitting being overly aggressive and causing loop unrolling not to happen (because how it distorts the CFG).
It is a aim at creating a cost model (there's none apart from not duplicating too much stmts) by means of the observation that we'd have to have PHI nodes in the joiner to have any possibility of CSE opportunities being exposed by duplicating it or threading opportunities being exposed across the new latch. That includes virtual PHIs for CSE (so any load/store) but not for the threading (but IMHO threading should figure all this out on its own without the requirement for somebody else duplicating the joiner). Bootstrapped and tested on x86_64-unknown-linux-gnu, the s390x libquantum regression is reportedly fixed by this. I had to adjust gcc.dg/tree-ssa/split-path-7.c to not expect any path splitting because I (and of course the cost model) can't see any reason to do it. Ok for trunk? Thanks, Richard. 2017-01-12 Richard Biener <rguent...@suse.de> PR tree-optimization/77283 * gimple-ssa-split-paths.c: Include gimple-ssa.h, tree-phinodes.h and ssa-iterators.h. (is_feasible_trace): Implement a cost model based on joiner PHI node uses. * gcc.dg/tree-ssa/split-path-7.c: Adjust. * gcc.dg/tree-ssa/split-path-8.c: New testcase. * gcc.dg/tree-ssa/split-path-9.c: Likewise. Index: gcc/gimple-ssa-split-paths.c =================================================================== --- gcc/gimple-ssa-split-paths.c (revision 244350) +++ gcc/gimple-ssa-split-paths.c (working copy) @@ -32,6 +32,9 @@ along with GCC; see the file COPYING3. #include "tracer.h" #include "predict.h" #include "params.h" +#include "gimple-ssa.h" +#include "tree-phinodes.h" +#include "ssa-iterators.h" /* Given LATCH, the latch block in a loop, see if the shape of the path reaching LATCH is suitable for being split by duplication. @@ -200,6 +203,58 @@ is_feasible_trace (basic_block bb) } } + /* If the joiner has no PHIs with useful uses there is zero chance + of CSE/DCE/jump-threading possibilities exposed by duplicating it. */ + bool found_useful_phi = false; + for (gphi_iterator si = gsi_start_phis (bb); ! gsi_end_p (si); + gsi_next (&si)) + { + gphi *phi = si.phi (); + use_operand_p use_p; + imm_use_iterator iter; + FOR_EACH_IMM_USE_FAST (use_p, iter, gimple_phi_result (phi)) + { + gimple *stmt = USE_STMT (use_p); + if (is_gimple_debug (stmt)) + continue; + /* If there's a use in the joiner this might be a CSE/DCE + opportunity. */ + if (gimple_bb (stmt) == bb) + { + found_useful_phi = true; + break; + } + /* If the use is on a loop header PHI and on one path the + value is unchanged this might expose a jump threading + opportunity. */ + if (gimple_code (stmt) == GIMPLE_PHI + && gimple_bb (stmt) == bb->loop_father->header + /* But for memory the PHI alone isn't good enough. */ + && ! virtual_operand_p (gimple_phi_result (stmt))) + { + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + if (gimple_phi_arg_def (phi, i) == gimple_phi_result (stmt)) + { + found_useful_phi = true; + break; + } + if (found_useful_phi) + break; + } + } + if (found_useful_phi) + break; + } + if (! found_useful_phi) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Block %d is a join that does not expose CSE/DCE/jump-thread " + "opportunities when duplicated.\n", + bb->index); + return false; + } + /* We may want something here which looks at dataflow and tries to guess if duplication of BB is likely to result in simplification of instructions in BB in either the original or the duplicate. */ Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c (revision 244350) +++ gcc/testsuite/gcc.dg/tree-ssa/split-path-7.c (working copy) @@ -91,4 +91,4 @@ linit () } } } -/* { dg-final { scan-tree-dump-times "Duplicating join block" 2 "split-paths" } } */ +/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/split-path-8.c (working copy) @@ -0,0 +1,14 @@ +/* PR77283 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-split-paths-details" } */ + +void +foo (double *x, double *a, double *b, long n, double limit) +{ + long i; + for (i=0; i < n; i++) + if (a[i] < limit) + x[i] = b[i]; +} + +/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/split-path-9.c (working copy) @@ -0,0 +1,17 @@ +/* PR77366 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-split-paths-details" } */ + +void +foo(unsigned int size, unsigned int *state) +{ + unsigned int i; + + for(i = 0; i < size; i++) + { + if(state[i] & 1) + state[i] ^= 1; + } +} + +/* { dg-final { scan-tree-dump-times "Duplicating join block" 0 "split-paths" } } */