PR63677 shows that late complete unrolling can expose quite some memory CSE opportunities (and followup simplifications) which we fail to exploit because currently DOMs ability to do memory CSE is quite limited (it requires the redundant load to optimize to directly follow the load or store). This is actually a regression from before alias-improvements, thus to the time where we had multiple virtual operand symbols which made us handle some cases better than now with a single memory operand symbol.
The following patch makes DOM use the alias walker to fix that. The easiest way to do that is to no longer hash virtual operands and use the expression hash lookup result as "candidate" if the expression involves memory. We can then verify if the candidate can be used by using the alias walker to walk to the candidate. If there is an intermediate possibly aliasing stmt we simply replace the hashtable entry and set up enough info to be able to rewind to the previous state during unwinding. This means the DOM implementation of memory CSE is a tad bit more efficient than that of SCCVN. But it is also very simplistic as it requires 1:1 requivalence for the memory access structure (as compared to SCCVN which can look through various kinds of aggregate copies or initializations as well as handling accessing the same memory location with different memory access structures). This is also the reason that the patch doesn't fix PR63864 which requires us to look through aggregate copies. To fix PR63677 I also had to teach DOM the trick of propagating &a + CST as &MEM[&a, CST] which is used by all other propagators. Given how easy this was to restore pre-alias-improvements state I wonder why I didn't think of this before... Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Ok for trunk? Thanks, Richard. 2014-11-19 Richard Biener <rguent...@suse.de> PR tree-optimization/63677 * tree-ssa-dom.c: Include gimplify.h for unshare_expr. (avail_exprs_stack): Make a vector of pairs. (struct hash_expr_elt): Replace stmt member with vop member. (expr_elt_hasher::equal): Simplify. (initialize_hash_element): Adjust. (initialize_hash_element_from_expr): Likewise. (dom_opt_dom_walker::thread_across_edge): Likewise. (record_cond): Likewise. (dom_opt_dom_walker::before_dom_children): Likewise. (print_expr_hash_elt): Likewise. (remove_local_expressions_from_table): Restore previous state if requested. (record_equivalences_from_stmt): Record &x + CST as constant &MEM[&x, CST] for further propagation. (vuse_eq): New function. (lookup_avail_expr): For loads use the alias oracle to see whether a candidate from the expr hash is usable. (avail_expr_hash): Do not hash VUSEs. * gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase. * gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise. Index: gcc/tree-ssa-dom.c =================================================================== *** gcc/tree-ssa-dom.c.orig 2014-11-19 13:28:46.123015743 +0100 --- gcc/tree-ssa-dom.c 2014-11-19 13:58:33.122937543 +0100 *************** along with GCC; see the file COPYING3. *** 66,71 **** --- 66,72 ---- #include "tree-ssa-threadedge.h" #include "tree-ssa-dom.h" #include "inchash.h" + #include "gimplify.h" /* This file implements optimizations on the dominator tree. */ *************** struct edge_info *** 139,145 **** marker. */ typedef struct expr_hash_elt * expr_hash_elt_t; ! static vec<expr_hash_elt_t> avail_exprs_stack; /* Structure for entries in the expression hash table. */ --- 140,146 ---- marker. */ typedef struct expr_hash_elt * expr_hash_elt_t; ! static vec<std::pair<expr_hash_elt_t, expr_hash_elt_t> > avail_exprs_stack; /* Structure for entries in the expression hash table. */ *************** struct expr_hash_elt *** 151,158 **** /* The expression (rhs) we want to record. */ struct hashable_expr expr; ! /* The stmt pointer if this element corresponds to a statement. */ ! gimple stmt; /* The hash value for RHS. */ hashval_t hash; --- 152,160 ---- /* The expression (rhs) we want to record. */ struct hashable_expr expr; ! /* The virtual operand associated with the nearest dominating stmt ! loading from or storing to expr. */ ! tree vop; /* The hash value for RHS. */ hashval_t hash; *************** expr_elt_hasher::hash (const value_type *** 187,196 **** inline bool expr_elt_hasher::equal (const value_type &p1, const compare_type &p2) { - gimple stmt1 = p1->stmt; const struct hashable_expr *expr1 = &p1->expr; const struct expr_hash_elt *stamp1 = p1->stamp; - gimple stmt2 = p2->stmt; const struct hashable_expr *expr2 = &p2->expr; const struct expr_hash_elt *stamp2 = p2->stamp; --- 189,196 ---- *************** expr_elt_hasher::equal (const value_type *** 198,222 **** if (stamp1 == stamp2) return true; ! /* FIXME tuples: ! We add stmts to a hash table and them modify them. To detect the case ! that we modify a stmt and then search for it, we assume that the hash ! is always modified by that change. ! We have to fully check why this doesn't happen on trunk or rewrite ! this in a more reliable (and easier to understand) way. */ ! if (((const struct expr_hash_elt *)p1)->hash ! != ((const struct expr_hash_elt *)p2)->hash) return false; /* In case of a collision, both RHS have to be identical and have the same VUSE operands. */ if (hashable_expr_equal_p (expr1, expr2) && types_compatible_p (expr1->type, expr2->type)) ! { ! /* Note that STMT1 and/or STMT2 may be NULL. */ ! return ((stmt1 ? gimple_vuse (stmt1) : NULL_TREE) ! == (stmt2 ? gimple_vuse (stmt2) : NULL_TREE)); ! } return false; } --- 198,211 ---- if (stamp1 == stamp2) return true; ! if (p1->hash != p2->hash) return false; /* In case of a collision, both RHS have to be identical and have the same VUSE operands. */ if (hashable_expr_equal_p (expr1, expr2) && types_compatible_p (expr1->type, expr2->type)) ! return true; return false; } *************** initialize_hash_element (gimple stmt, tr *** 387,393 **** gcc_unreachable (); element->lhs = lhs; ! element->stmt = stmt; element->hash = avail_expr_hash (element); element->stamp = element; } --- 376,382 ---- gcc_unreachable (); element->lhs = lhs; ! element->vop = gimple_vuse (stmt); element->hash = avail_expr_hash (element); element->stamp = element; } *************** initialize_hash_element_from_expr (struc *** 429,435 **** { element->expr = *expr; element->lhs = lhs; ! element->stmt = NULL; element->hash = avail_expr_hash (element); element->stamp = element; } --- 418,424 ---- { element->expr = *expr; element->lhs = lhs; ! element->vop = NULL_TREE; element->hash = avail_expr_hash (element); element->stamp = element; } *************** add_hashable_expr (const struct hashable *** 681,690 **** static void print_expr_hash_elt (FILE * stream, const struct expr_hash_elt *element) { ! if (element->stmt) ! fprintf (stream, "STMT "); ! else ! fprintf (stream, "COND "); if (element->lhs) { --- 670,676 ---- static void print_expr_hash_elt (FILE * stream, const struct expr_hash_elt *element) { ! fprintf (stream, "STMT "); if (element->lhs) { *************** print_expr_hash_elt (FILE * stream, cons *** 758,770 **** } break; } - fprintf (stream, "\n"); ! if (element->stmt) { ! fprintf (stream, " "); ! print_gimple_stmt (stream, element->stmt, 0, 0); } } /* Delete variable sized pieces of the expr_hash_elt ELEMENT. */ --- 744,757 ---- } break; } ! if (element->vop) { ! fprintf (stream, " with "); ! print_generic_expr (stream, element->vop, 0); } + + fprintf (stream, "\n"); } /* Delete variable sized pieces of the expr_hash_elt ELEMENT. */ *************** remove_local_expressions_from_table (voi *** 1067,1076 **** /* Remove all the expressions made available in this block. */ while (avail_exprs_stack.length () > 0) { ! expr_hash_elt_t victim = avail_exprs_stack.pop (); expr_hash_elt **slot; ! if (victim == NULL) break; /* This must precede the actual removal from the hash table, --- 1054,1064 ---- /* Remove all the expressions made available in this block. */ while (avail_exprs_stack.length () > 0) { ! std::pair<expr_hash_elt_t, expr_hash_elt_t> victim ! = avail_exprs_stack.pop (); expr_hash_elt **slot; ! if (victim.first == NULL) break; /* This must precede the actual removal from the hash table, *************** remove_local_expressions_from_table (voi *** 1079,1090 **** if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "<<<< "); ! print_expr_hash_elt (dump_file, victim); } ! slot = avail_exprs->find_slot (victim, NO_INSERT); ! gcc_assert (slot && *slot == victim); ! avail_exprs->clear_slot (slot); } } --- 1067,1084 ---- if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "<<<< "); ! print_expr_hash_elt (dump_file, victim.first); } ! slot = avail_exprs->find_slot (victim.first, NO_INSERT); ! gcc_assert (slot && *slot == victim.first); ! if (victim.second != NULL) ! { ! free_expr_hash_elt (*slot); ! *slot = victim.second; ! } ! else ! avail_exprs->clear_slot (slot); } } *************** dom_opt_dom_walker::thread_across_edge ( *** 1171,1177 **** /* Push a marker on both stacks so we can unwind the tables back to their current state. */ ! avail_exprs_stack.safe_push (NULL); const_and_copies_stack.safe_push (NULL_TREE); /* Traversing E may result in equivalences we can utilize. */ --- 1165,1172 ---- /* Push a marker on both stacks so we can unwind the tables back to their current state. */ ! avail_exprs_stack.safe_push ! (std::pair<expr_hash_elt_t, expr_hash_elt_t> (NULL, NULL)); const_and_copies_stack.safe_push (NULL_TREE); /* Traversing E may result in equivalences we can utilize. */ *************** record_cond (cond_equivalence *p) *** 1412,1418 **** print_expr_hash_elt (dump_file, element); } ! avail_exprs_stack.safe_push (element); } else free_expr_hash_elt (element); --- 1407,1414 ---- print_expr_hash_elt (dump_file, element); } ! avail_exprs_stack.safe_push ! (std::pair<expr_hash_elt_t, expr_hash_elt_t> (element, NULL)); } else free_expr_hash_elt (element); *************** dom_opt_dom_walker::before_dom_children *** 1971,1977 **** /* Push a marker on the stacks of local information so that we know how far to unwind when we finalize this block. */ ! avail_exprs_stack.safe_push (NULL); const_and_copies_stack.safe_push (NULL_TREE); record_equivalences_from_incoming_edge (bb); --- 1967,1974 ---- /* Push a marker on the stacks of local information so that we know how far to unwind when we finalize this block. */ ! avail_exprs_stack.safe_push ! (std::pair<expr_hash_elt_t, expr_hash_elt_t> (NULL, NULL)); const_and_copies_stack.safe_push (NULL_TREE); record_equivalences_from_incoming_edge (bb); *************** dom_opt_dom_walker::before_dom_children *** 1982,1988 **** /* Create equivalences from redundant PHIs. PHIs are only truly redundant when they exist in the same block, so push another marker and unwind right afterwards. */ ! avail_exprs_stack.safe_push (NULL); for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) eliminate_redundant_computations (&gsi); remove_local_expressions_from_table (); --- 1979,1986 ---- /* Create equivalences from redundant PHIs. PHIs are only truly redundant when they exist in the same block, so push another marker and unwind right afterwards. */ ! avail_exprs_stack.safe_push ! (std::pair<expr_hash_elt_t, expr_hash_elt_t> (NULL, NULL)); for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) eliminate_redundant_computations (&gsi); remove_local_expressions_from_table (); *************** record_equivalences_from_stmt (gimple st *** 2192,2197 **** --- 2190,2221 ---- } } + /* Make sure we can propagate &x + CST. */ + if (lhs_code == SSA_NAME + && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR + && TREE_CODE (gimple_assign_rhs1 (stmt)) == ADDR_EXPR + && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST) + { + tree op0 = gimple_assign_rhs1 (stmt); + tree op1 = gimple_assign_rhs2 (stmt); + tree new_rhs + = build_fold_addr_expr (fold_build2 (MEM_REF, + TREE_TYPE (TREE_TYPE (op0)), + unshare_expr (op0), + fold_convert (ptr_type_node, + op1))); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "==== ASGN "); + print_generic_expr (dump_file, lhs, 0); + fprintf (dump_file, " = "); + print_generic_expr (dump_file, new_rhs, 0); + fprintf (dump_file, "\n"); + } + + set_ssa_name_value (lhs, new_rhs); + } + /* A memory store, even an aliased store, creates a useful equivalence. By exchanging the LHS and RHS, creating suitable vops and recording the result in the available expression table, *************** optimize_stmt (basic_block bb, gimple_st *** 2511,2516 **** --- 2535,2560 ---- } } + /* Helper for walk_non_aliased_vuses. Determine if we arrived at + the desired memory state. */ + + static void * + vuse_eq (ao_ref *, tree vuse1, unsigned int cnt, void *data) + { + tree vuse2 = (tree) data; + if (vuse1 == vuse2) + return data; + + /* This bounds the stmt walks we perform on reference lookups + to O(1) instead of O(N) where N is the number of dominating + stores leading to a candidate. We re-use the SCCVN param + for this as it is basically the same complexity. */ + if (cnt > (unsigned) PARAM_VALUE (PARAM_SCCVN_MAX_ALIAS_QUERIES_PER_ACCESS)) + return (void *)-1; + + return NULL; + } + /* Search for an existing instance of STMT in the AVAIL_EXPRS table. If found, return its LHS. Otherwise insert STMT in the table and return NULL_TREE. *************** lookup_avail_expr (gimple stmt, bool ins *** 2569,2583 **** print_expr_hash_elt (dump_file, element2); } ! avail_exprs_stack.safe_push (element2); return NULL_TREE; } ! else ! free_expr_hash_elt_contents (&element); /* Extract the LHS of the assignment so that it can be used as the current definition of another variable. */ ! lhs = ((struct expr_hash_elt *)*slot)->lhs; /* See if the LHS appears in the CONST_AND_COPIES table. If it does, then use the value from the const_and_copies table. */ --- 2613,2664 ---- print_expr_hash_elt (dump_file, element2); } ! avail_exprs_stack.safe_push ! (std::pair<expr_hash_elt_t, expr_hash_elt_t> (element2, NULL)); return NULL_TREE; } ! ! /* If we found a redundant memory operation do an alias walk to ! check if we can re-use it. */ ! if (gimple_vuse (stmt) != (*slot)->vop) ! { ! tree vuse1 = (*slot)->vop; ! tree vuse2 = gimple_vuse (stmt); ! /* If we have a load of a register and a candidate in the ! hash with vuse1 then try to reach its stmt by walking ! up the virtual use-def chain using walk_non_aliased_vuses. ! But don't do this when removing expressions from the hash. */ ! ao_ref ref; ! if (!(vuse1 && vuse2 ! && gimple_assign_single_p (stmt) ! && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME ! && (ao_ref_init (&ref, gimple_assign_rhs1 (stmt)), true) ! && walk_non_aliased_vuses (&ref, vuse2, ! vuse_eq, NULL, vuse1) != NULL)) ! { ! struct expr_hash_elt *element2 = XNEW (struct expr_hash_elt); ! *element2 = element; ! element2->stamp = element2; ! ! /* Insert the expr into the hash by replacing the current ! entry and recording the value to restore in the ! aval_exprs_stack. */ ! avail_exprs_stack.safe_push (std::make_pair (element2, *slot)); ! *slot = element2; ! if (dump_file && (dump_flags & TDF_DETAILS)) ! { ! fprintf (dump_file, "2>>> "); ! print_expr_hash_elt (dump_file, *slot); ! } ! return NULL_TREE; ! } ! } ! ! free_expr_hash_elt_contents (&element); /* Extract the LHS of the assignment so that it can be used as the current definition of another variable. */ ! lhs = (*slot)->lhs; /* See if the LHS appears in the CONST_AND_COPIES table. If it does, then use the value from the const_and_copies table. */ *************** lookup_avail_expr (gimple stmt, bool ins *** 2605,2630 **** static hashval_t avail_expr_hash (const void *p) { - gimple stmt = ((const struct expr_hash_elt *)p)->stmt; const struct hashable_expr *expr = &((const struct expr_hash_elt *)p)->expr; - tree vuse; inchash::hash hstate; inchash::add_hashable_expr (expr, hstate); - /* If the hash table entry is not associated with a statement, then we - can just hash the expression and not worry about virtual operands - and such. */ - if (!stmt) - return hstate.end (); - - /* Add the SSA version numbers of the vuse operand. This is important - because compound variables like arrays are not renamed in the - operands. Rather, the rename is done on the virtual variable - representing all the elements of the array. */ - if ((vuse = gimple_vuse (stmt))) - inchash::add_expr (vuse, hstate); - return hstate.end (); } --- 2686,2696 ---- Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c =================================================================== *** /dev/null 1970-01-01 00:00:00.000000000 +0000 --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 2014-11-19 13:54:38.073947829 +0100 *************** *** 0 **** --- 1,21 ---- + /* { dg-do compile } */ + /* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */ + + int + foo () + { + const int a[8] = { 0, 1, 2, 3, 4, 5, 6, 7 }; + int i, sum; + + sum = 0; + for (i = 0; i < sizeof (a) / sizeof (*a); i++) + sum += a[i]; + + return sum; + } + + /* After late unrolling the above loop completely DOM should be + able to optimize this to return 28. */ + + /* { dg-final { scan-tree-dump "return 28;" "optimized" } } */ + /* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-3.c =================================================================== *** /dev/null 1970-01-01 00:00:00.000000000 +0000 --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-3.c 2014-11-19 13:54:38.096947828 +0100 *************** *** 0 **** --- 1,31 ---- + /* { dg-do run } */ + /* { dg-options "-O -fno-tree-fre -fdump-tree-dom1" } */ + + extern void abort (void); + + int a; + int __attribute__((noinline)) + foo (int b) + { + a = 0; + if (b) + { + a = 1; + return a; + } + /* DOM should be able to CSE both loads here, forwarding 0 and 1 + to the PHI feeding the return. */ + return a; + } + + int + main() + { + if (foo (0) != 0 + || foo (1) != 1) + abort (); + return 0; + } + + /* { dg-final { scan-tree-dump "= PHI <\[01\]\\\(.\\\), \[01\]\\\(.\\\)>" "dom1" } } */ + /* { dg-final { cleanup-tree-dump "dom1" } } */