This patch converts the fold-mem-offsets pass from DF to RTL-SSA. Along with this conversion, the way the pass collects information was completely reworked. Instead of visiting each instruction multiple times, this is now down only once.
Most significant changes are: * The pass operates mainly on insn_info objects from RTL-SSA. * Single iteration over all nondebug INSNs for identification of fold-mem-roots. Then walk of the fold-mem-roots' DEF-chain to collect foldable constants. * The class fold_mem_info holds vectors for the DEF-chain of the to-be-folded INSNs (fold_agnostic_insns, which don't need to be adjusted, and fold_insns, which need their constant to be set to zero). * Introduction of a single-USE mode, which only collects DEFs, that have a single USE and therefore are safe to transform (the fold-mem-root will be the final USE). This mode is fast and will always run (unless disabled via -fno-fold-mem-offsets). * Introduction of a multi-USE mode, which allows DEFs to have multiple USEs, but all USEs must be part of any fold-mem-root's DEF-chain. The analysis of all USEs is expensive and therefore, this mode is disabled for highly connected CFGs. Note, that multi-USE mode will miss some opportunities that the single-USE mode finds (e.g. multi-USE mode fails for fold-mem-offsets-3.c). The following testing was done: * Bootstrapped and regtested on aarch64-linux and x86-64-linux. * Regtested on riscv64-linux. * SPEC CPU 2017 tested on aarch64 and riscv64-linux. The number of applied optimizations of different versions/modes of fold-mem-offsets in SPEC CPU2017 on RISC-V rv64gc_zba_zbb_zbs is as follows: Upstream: 500.perlbench_r: 169 502.gcc_r: 624 520.omnetpp_r: 1301 523.xalancbmk_r: 23 525.x264_r: 705 531.deepsjeng_r: 36 541.leela_r: 19 548.exchange2: 90 557.xz_r: 11 SUM: 2151 New single-USE: 500.perlbench_r: 70 502.gcc_r: 263 520.omnetpp_r: 1100 523.xalancbmk_r: 10 525.x264_r: 95 531.deepsjeng_r: 19 541.leela_r: 252 548.exchange2: 13 557.xz_r: 11 SUM: 1833 New multi-USE: 500.perlbench_r: 186 502.gcc_r: 744 520.omnetpp_r: 1187 523.xalancbmk_r: 22 525.x264_r: 985 531.deepsjeng_r: 21 541.leela_r: 87 548.exchange2: 63 557.xz_r: 23 SUM: 3318 New single-USE then multi-USE: 500.perlbench_r: 192 502.gcc_r: 761 520.omnetpp_r: 1673 523.xalancbmk_r: 22 525.x264_r: 995 531.deepsjeng_r: 21 541.leela_r: 252 548.exchange2: 63 557.xz_r: 23 SUM: 4002 A compile time analysis with `/bin/time -v ./install/usr/local/bin/gcc -O2 all.i` (all.i from PR117922) shows: * -fno-fold-mem-offsets: 289 s (user time) / 15572436 kBytes (max resident set size) * -ffold-mem-offsets: 339 s (user time) / 23606516 kBytes (max resident set size) Adding -fexpensive-optimizations to enable multi-USE mode does not have an impact on the duration or the memory footprint. SPEC CPU 2017 showed no significant performance impact on aarch64-linux. gcc/ChangeLog: PR rtl-optimization/117922 * fold-mem-offsets.cc (INCLUDE_ALGORITHM): Added definition. (INCLUDE_FUNCTIONAL): Likewise. (INCLUDE_ARRAY): Likewise. (class pass_fold_mem_offsets): Moved to bottom of file. (get_fold_mem_offset_root): Converted to RTL-SSA. (get_single_def_in_bb): Converted to RTL-SSA. (get_uses): New. (has_foldable_uses_p): Converted to RTL-SSA. (fold_offsets): Converted to RTL-SSA. (fold_offsets_1): Converted to RTL-SSA. (get_fold_mem_root): Removed. (do_check_validity): New. (do_analysis): Removed. (insn_uses_not_in_bitmap): New. (do_fold_info_calculation): Removed. (drop_unsafe_candidates): New. (do_commit_offset): Converted to RTL-SSA. (compute_validity_closure): Removed. (do_commit_insn): Changed to change INSN in place. (fold_mem_offsets_single_use): New. (fold_mem_offsets_multi_use): New. (pass_fold_mem_offsets::execute): Moved to bottom of file. (fold_mem_offsets): New. Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu> --- gcc/fold-mem-offsets.cc | 1138 ++++++++++++++++++++------------------- 1 file changed, 596 insertions(+), 542 deletions(-) diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc index c1c94472a071..0e777c32ee31 100644 --- a/gcc/fold-mem-offsets.cc +++ b/gcc/fold-mem-offsets.cc @@ -17,24 +17,34 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ +#define INCLUDE_ALGORITHM +#define INCLUDE_FUNCTIONAL +#define INCLUDE_ARRAY #include "config.h" #include "system.h" #include "coretypes.h" -#include "tm.h" +#include "backend.h" #include "rtl.h" +#include "rtlanal.h" +#include "df.h" +#include "rtl-ssa.h" + +#include "predict.h" +#include "cfgrtl.h" +#include "cfgcleanup.h" +#include "cgraph.h" +#include "tree-pass.h" +#include "target.h" + +#include "tm.h" #include "tree.h" #include "expr.h" #include "backend.h" #include "regs.h" -#include "target.h" #include "memmodel.h" #include "emit-rtl.h" #include "insn-config.h" #include "recog.h" -#include "predict.h" -#include "df.h" -#include "tree-pass.h" -#include "cfgrtl.h" #include "diagnostic-core.h" /* This pass tries to optimize memory offset calculations by moving constants @@ -69,214 +79,248 @@ along with GCC; see the file COPYING3. If not see allocated on the stack can result in unwanted add instructions that cannot be eliminated easily. - This pass works on a basic block level and consists of 4 phases: - - - Phase 1 (Analysis): Find "foldable" instructions. - Foldable instructions are those that we know how to propagate - a constant addition through (add, shift, move, ...) and only have other - foldable instructions for uses. In that phase a DFS traversal on the - definition tree is performed and foldable instructions are marked on - a bitmap. The add immediate instructions that are reachable in this - DFS are candidates for folding since all the intermediate calculations - affected by them are also foldable. - - - Phase 2 (Validity): Traverse and calculate the offsets that would result - from folding the add immediate instructions. Check whether the - calculated offsets result in a valid instruction for the target. - - - Phase 3 (Commit offsets): Traverse again. It is now known which folds - are valid so at this point change the offsets in the memory instructions. - - - Phase 4 (Commit instruction deletions): Scan all instructions and delete - or simplify (reduce to move) all add immediate instructions that were - folded. + The pass differentiates between the following instructions: + + - fold-mem-offset root insn: loads/stores where constants will be folded into + the address offset. E.g.: + (set (mem:DI (plus:DI (reg:DI sp) (const_int 40))) (reg:DI ra)) + - fold-agnostic insns: instructions that may have an impact on the offset + calculation, but that don't require any fixup when folding. E.g.: + (set (reg:DI a0) (ashift:DI (reg:DI s1) (const_int 1))) + - fold insns: instructions that provide constants, which will be forwarded + into the loads/stores as offset. When folding, the constants will be + set to zero. E.g.: + (set (reg:DI s0) (plus:DI (reg:DI sp) (const_int 8))) + + The pass utilizes the RTL SSA framework to get the data dependencies + and operates in the following phases: + + - Phase 1: Iterate over all instructions to identify fold-mem-offset roots. + - Phase 2: Walk back along the def-chain of fold-agnostic or fold insns. + When successful a new offset of the fold-mem-offset is calculated + and a vec of fold insns that need adjustments is created. + - Phase 3: Drop all fold-mem-offset roots that won't accept the updated + offset. + - Phase 4: Ensure that the defs of all fold insns are used only by + fold-mem-offsets insns (only needed if DEFs with multiple USESs + are enabled via -fexpensive-optimizations). + - Phase 5: Update all fold-mem-offset roots and adjust the fold insns. + + When we walk the DEF-chain we have two choices of operations: + + - single-USE mode: Only DEFs that have exactly one USE (in the instruction + that we come from) are allowed. This greatly simplify the problem, but + also misses some cases. + - multi-USE mode: DEFs are allowed to have multiple USEs. E.g. a single + ADDI may define a value that is used by two LOADs. In this case, we need + to ensure that all USE-chains remain correct after we apply the folding. + This is done by allowing only USEs that are part of any other + fold-mem-offset chain in phase 4 above. This pass should run before hard register propagation because it creates register moves that we expect to be eliminated. */ -namespace { - -const pass_data pass_data_fold_mem = -{ - RTL_PASS, /* type */ - "fold_mem_offsets", /* name */ - OPTGROUP_NONE, /* optinfo_flags */ - TV_FOLD_MEM_OFFSETS, /* tv_id */ - 0, /* properties_required */ - 0, /* properties_provided */ - 0, /* properties_destroyed */ - 0, /* todo_flags_start */ - TODO_df_finish, /* todo_flags_finish */ -}; - -class pass_fold_mem_offsets : public rtl_opt_pass -{ -public: - pass_fold_mem_offsets (gcc::context *ctxt) - : rtl_opt_pass (pass_data_fold_mem, ctxt) - {} - - /* opt_pass methods: */ - virtual bool gate (function *) - { - return flag_fold_mem_offsets && optimize >= 2; - } - - virtual unsigned int execute (function *); -}; // class pass_fold_mem_offsets +using namespace rtl_ssa; /* Class that holds in FOLD_INSNS the instructions that if folded the offset of a memory instruction would increase by ADDED_OFFSET. */ class fold_mem_info { public: - auto_bitmap fold_insns; + /* fold-mem-offset root details */ + insn_info *insn; + rtx mem; + rtx reg; + HOST_WIDE_INT offset; + + /* Offset that can be folded into the fold-mem-offset root. */ HOST_WIDE_INT added_offset; + + /* DEF-chain INSNs. */ + auto_vec<insn_info *> fold_agnostic_insns; + auto_vec<insn_info *> fold_insns; }; -typedef hash_map<rtx_insn *, fold_mem_info *> fold_info_map; +/* Test if INSN is a memory load / store that can have an offset folded to it. + Return true iff INSN is such an instruction and return through MEM, + REG and OFFSET the RTX that has a MEM code, the register that is + used as a base address and the offset accordingly. */ +bool +get_fold_mem_offset_root (insn_info *insn, rtx *mem, rtx *reg, + HOST_WIDE_INT *offset) +{ + rtx set = single_set (insn->rtl ()); + if (set != NULL_RTX) + { + rtx src = SET_SRC (set); + rtx dest = SET_DEST (set); -/* Tracks which instructions can be reached through instructions that can - propagate offsets for folding. */ -static bitmap_head can_fold_insns; + /* Don't fold when we have unspec / volatile. */ + if (GET_CODE (src) == UNSPEC + || GET_CODE (src) == UNSPEC_VOLATILE + || GET_CODE (dest) == UNSPEC + || GET_CODE (dest) == UNSPEC_VOLATILE) + return false; -/* Marks instructions that are currently eligible for folding. */ -static bitmap_head candidate_fold_insns; + if (MEM_P (src)) + *mem = src; + else if (MEM_P (dest)) + *mem = dest; + else if ((GET_CODE (src) == SIGN_EXTEND + || GET_CODE (src) == ZERO_EXTEND) + && MEM_P (XEXP (src, 0))) + *mem = XEXP (src, 0); + else + return false; + } + else + return false; -/* Tracks instructions that cannot be folded because it turned out that - folding will result in creating an invalid memory instruction. - An instruction can be in both CANDIDATE_FOLD_INSNS and CANNOT_FOLD_INSNS - at the same time, in which case it is not legal to fold. */ -static bitmap_head cannot_fold_insns; + rtx mem_addr = XEXP (*mem, 0); + if (REG_P (mem_addr)) + { + *reg = mem_addr; + *offset = 0; + } + else if (GET_CODE (mem_addr) == PLUS + && REG_P (XEXP (mem_addr, 0)) + && CONST_INT_P (XEXP (mem_addr, 1))) + { + *reg = XEXP (mem_addr, 0); + *offset = INTVAL (XEXP (mem_addr, 1)); + } + else + return false; -/* The number of instructions that were simplified or eliminated. */ -static int stats_fold_count; + return true; +} -/* Get the single reaching definition of an instruction inside a BB. - The definition is desired for REG used in INSN. - Return the definition insn or NULL if there's no definition with - the desired criteria. */ -static rtx_insn * -get_single_def_in_bb (rtx_insn *insn, rtx reg) +/* Get the single reaching definition of REG used in INSN within a BB. + Return the definition or NULL if there's no definition with the desired + criteria. If SINGLE_USE is set to true the DEF must have exactly one + USE resulting in a 1:1 DEF-USE relationship. If set to false, then a + 1:n DEF-USE relationship is accepted and the caller must take care to + ensure all USEs are safe for folding. */ +static set_info * +get_single_def_in_bb (insn_info *insn, rtx reg, bool single_use) { - df_ref use; - struct df_link *ref_chain, *ref_link; - - FOR_EACH_INSN_USE (use, insn) + /* Get the use_info of the base register. */ + for (use_info *use : insn->uses ()) { - if (GET_CODE (DF_REF_REG (use)) == SUBREG) + /* Other USEs can be ignored and multiple equal USEs are fine. */ + if (use->regno () != REGNO (reg)) + continue; + + /* Don't handle subregs for now. */ + if (use->includes_subregs ()) return NULL; - if (REGNO (DF_REF_REG (use)) == REGNO (reg)) - break; - } - if (!use) - return NULL; + /* Get the DEF of the register. */ + set_info *def = use->def (); + if (!def) + return NULL; - ref_chain = DF_REF_CHAIN (use); + /* Limit the amount of USEs of DEF to 1. */ + auto iter = def->nondebug_insn_uses (); + if (single_use && ++iter.begin () != iter.end ()) + return NULL; - if (!ref_chain) - return NULL; + /* Don't handle multiregs for now. */ + if (def->includes_multiregs ()) + return NULL; - for (ref_link = ref_chain; ref_link; ref_link = ref_link->next) - { - /* Problem getting some definition for this instruction. */ - if (ref_link->ref == NULL) + /* Only consider uses whose definition comes from a real instruction + and has no notes attached. */ + insn_info *def_insn = def->insn (); + if (def_insn->is_artificial () || def_insn->first_note ()) return NULL; - if (DF_REF_INSN_INFO (ref_link->ref) == NULL) + + /* No parallel expressions or clobbers. */ + if (def_insn->num_defs () != 1) return NULL; - if (global_regs[REGNO (reg)] - && !set_of (reg, DF_REF_INSN (ref_link->ref))) + + rtx_insn *def_rtl = def_insn->rtl (); + if (!NONJUMP_INSN_P (def_rtl) || RTX_FRAME_RELATED_P (def_rtl)) return NULL; - } - if (ref_chain->next) - return NULL; + /* Check if the DEF is a SET of the expected form. */ + rtx def_set = simple_regno_set (PATTERN (def_rtl), def->regno ()); + if (!def_set) + return NULL; - rtx_insn *def = DF_REF_INSN (ref_chain->ref); + /* Ensure DEF and USE are in the same BB. */ + if (def->bb () != insn->bb ()) + return NULL; - if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn)) - return NULL; + /* Ensure the definition comes before the insn. */ + if (*use->insn () < *def_insn) + return NULL; - if (DF_INSN_LUID (def) > DF_INSN_LUID (insn)) - return NULL; + return def; + } - return def; + return NULL; } -/* Get all uses of REG which is set in INSN. Return the use list or NULL if a - use is missing / irregular. If SUCCESS is not NULL then set it to false if - there are missing / irregular uses and true otherwise. */ -static df_link * -get_uses (rtx_insn *insn, rtx reg, bool *success) +/* Test if all USEs of DEF (which defines REG) meet certain criteria to be + foldable. Returns true iff all USEs are fine or false otherwise. */ +static bool +has_foldable_uses_p (set_info *def, rtx reg) { - df_ref def; - - if (success) - *success = false; - - FOR_EACH_INSN_DEF (def, insn) - if (REGNO (DF_REF_REG (def)) == REGNO (reg)) - break; - - if (!def) - return NULL; + /* We only fold through instructions that are transitively used as + memory addresses and do not have other uses. Use the same logic + from offset calculation to visit instructions that can propagate + offsets and keep track of them in CAN_FOLD_INSNS. */ + for (use_info *use : def->nondebug_insn_uses ()) + { + insn_info *use_insn = use->insn (); + if (use_insn->is_artificial ()) + return false; - df_link *ref_chain = DF_REF_CHAIN (def); - int insn_luid = DF_INSN_LUID (insn); - basic_block insn_bb = BLOCK_FOR_INSN (insn); + /* Punt if the use is anything more complicated than a set + (clobber, use, etc). */ + rtx_insn *use_rtl = use_insn->rtl (); + if (!NONJUMP_INSN_P (use_rtl) || GET_CODE (PATTERN (use_rtl)) != SET) + return false; - for (df_link *ref_link = ref_chain; ref_link; ref_link = ref_link->next) - { - /* Problem getting a use for this instruction. */ - if (ref_link->ref == NULL) - return NULL; - if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR) - return NULL; + /* Special case: A foldable memory store is not foldable if it + mentions DEST outside of the address calculation. */ + rtx use_set = PATTERN (use_rtl); + if (use_set && MEM_P (SET_DEST (use_set)) + && reg_mentioned_p (reg, SET_SRC (use_set))) + return false; - rtx_insn *use = DF_REF_INSN (ref_link->ref); - if (DEBUG_INSN_P (use)) - continue; + if (use->bb () != def->bb ()) + return false; - /* We do not handle REG_EQUIV/REG_EQ notes for now. */ - if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE) - return NULL; - if (BLOCK_FOR_INSN (use) != insn_bb) - return NULL; /* Punt if use appears before def in the basic block. See PR111601. */ - if (DF_INSN_LUID (use) < insn_luid) - return NULL; + if (*use_insn < *def->insn ()) + return false; } - if (success) - *success = true; - - return ref_chain; + return true; } static HOST_WIDE_INT -fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns); - -/* Helper function for fold_offsets. +fold_offsets (insn_info *insn, rtx reg, fold_mem_info *info, bool single_use); - If DO_RECURSION is false and ANALYZE is true this function returns true iff - it understands the structure of INSN and knows how to propagate constants - through it. In this case OFFSET_OUT and FOLDABLE_INSNS are unused. +/* Helper function for fold_offsets () that analyses the given INSN. - If DO_RECURSION is true then it also calls fold_offsets for each recognized - part of INSN with the appropriate arguments. + For INSN with known pattern, we calculate the value of the propagated + constant and store that in OFFSET_OUT. Foldable INSNs are added to + INFO->fold_insns and fold-agnostic INSNs are added to + INFO->fold_agnostic_insns. It is possible that some INSNs are added to + both lists. In this case the INSN is a fold INSN. - If DO_RECURSION is true and ANALYZE is false then offset that would result - from folding is computed and is returned through the pointer OFFSET_OUT. - The instructions that can be folded are recorded in FOLDABLE_INSNS. */ + Returns true iff the analysis was successful and false otherwise. */ static bool -fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, - HOST_WIDE_INT *offset_out, bitmap foldable_insns) +fold_offsets_1 (insn_info *insn, HOST_WIDE_INT *offset_out, fold_mem_info *info, + bool single_use) { - /* Doesn't make sense if both DO_RECURSION and ANALYZE are false. */ - gcc_checking_assert (do_recursion || analyze); - gcc_checking_assert (GET_CODE (PATTERN (insn)) == SET); + bool fold_agnostic = true; + rtx_insn *insn_rtl = insn->rtl (); + gcc_checking_assert (GET_CODE (PATTERN (insn_rtl)) == SET); - rtx src = SET_SRC (PATTERN (insn)); + rtx src = SET_SRC (PATTERN (insn_rtl)); HOST_WIDE_INT offset = 0; switch (GET_CODE (src)) @@ -288,35 +332,31 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, rtx arg2 = XEXP (src, 1); if (REG_P (arg1)) - { - if (do_recursion) - offset += fold_offsets (insn, arg1, analyze, foldable_insns); - } + offset += fold_offsets (insn, arg1, info, single_use); else if (GET_CODE (arg1) == ASHIFT && REG_P (XEXP (arg1, 0)) && CONST_INT_P (XEXP (arg1, 1))) { /* Handle R1 = (R2 << C) + ... */ - if (do_recursion) - { - HOST_WIDE_INT scale - = (HOST_WIDE_INT_1U << INTVAL (XEXP (arg1, 1))); - offset += scale * fold_offsets (insn, XEXP (arg1, 0), analyze, - foldable_insns); - } + rtx reg = XEXP (arg1, 0); + rtx shamt = XEXP (arg1, 1); + HOST_WIDE_INT scale = HOST_WIDE_INT_1U << INTVAL (shamt); + offset += scale * fold_offsets (insn, reg, info, single_use); } else if (GET_CODE (arg1) == PLUS && REG_P (XEXP (arg1, 0)) && REG_P (XEXP (arg1, 1))) { /* Handle R1 = (R2 + R3) + ... */ - if (do_recursion) + rtx reg1 = XEXP (arg1, 0); + rtx reg2 = XEXP (arg1, 1); + if (REGNO (reg1) != REGNO (reg2)) { - offset += fold_offsets (insn, XEXP (arg1, 0), analyze, - foldable_insns); - offset += fold_offsets (insn, XEXP (arg1, 1), analyze, - foldable_insns); + offset += fold_offsets (insn, reg1, info, single_use); + offset += fold_offsets (insn, reg2, info, single_use); } + else + offset += 2 * fold_offsets (insn, reg1, info, single_use); } else if (GET_CODE (arg1) == PLUS && GET_CODE (XEXP (arg1, 0)) == ASHIFT @@ -325,32 +365,31 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, && REG_P (XEXP (arg1, 1))) { /* Handle R1 = ((R2 << C) + R3) + ... */ - if (do_recursion) + rtx reg1 = XEXP (XEXP (arg1, 0), 0); + rtx shamt = XEXP (XEXP (arg1, 0), 1); + rtx reg2 = XEXP (arg1, 1); + HOST_WIDE_INT scale = HOST_WIDE_INT_1U << INTVAL (shamt); + if (REGNO (reg1) != REGNO (reg2)) { - HOST_WIDE_INT scale - = (HOST_WIDE_INT_1U << INTVAL (XEXP (XEXP (arg1, 0), 1))); - offset += scale * fold_offsets (insn, XEXP (XEXP (arg1, 0), 0), - analyze, foldable_insns); - offset += fold_offsets (insn, XEXP (arg1, 1), analyze, - foldable_insns); + offset += scale * fold_offsets (insn, reg1, info, single_use); + offset += fold_offsets (insn, reg2, info, single_use); } + else + offset += (scale + 1) * fold_offsets (insn, reg1, info, + single_use); } else return false; if (REG_P (arg2)) - { - if (do_recursion) - offset += fold_offsets (insn, arg2, analyze, foldable_insns); - } + offset += fold_offsets (insn, arg2, info, single_use); else if (CONST_INT_P (arg2)) { if (REG_P (arg1)) { offset += INTVAL (arg2); /* This is a R1 = R2 + C instruction, candidate for folding. */ - if (!analyze) - bitmap_set_bit (foldable_insns, INSN_UID (insn)); + fold_agnostic = false; } } else @@ -366,26 +405,19 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, rtx arg2 = XEXP (src, 1); if (REG_P (arg1)) - { - if (do_recursion) - offset += fold_offsets (insn, arg1, analyze, foldable_insns); - } + offset += fold_offsets (insn, arg1, info, single_use); else return false; if (REG_P (arg2)) - { - if (do_recursion) - offset -= fold_offsets (insn, arg2, analyze, foldable_insns); - } + offset -= fold_offsets (insn, arg2, info, single_use); else if (CONST_INT_P (arg2)) { if (REG_P (arg1)) { offset -= INTVAL (arg2); /* This is a R1 = R2 - C instruction, candidate for folding. */ - if (!analyze) - bitmap_set_bit (foldable_insns, INSN_UID (insn)); + fold_agnostic = false; } } else @@ -399,10 +431,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, /* Propagate through negation. */ rtx arg1 = XEXP (src, 0); if (REG_P (arg1)) - { - if (do_recursion) - offset = -fold_offsets (insn, arg1, analyze, foldable_insns); - } + offset = -fold_offsets (insn, arg1, info, single_use); else return false; @@ -417,12 +446,8 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, if (REG_P (arg1) && CONST_INT_P (arg2)) { - if (do_recursion) - { - HOST_WIDE_INT scale = INTVAL (arg2); - offset = scale * fold_offsets (insn, arg1, analyze, - foldable_insns); - } + HOST_WIDE_INT scale = INTVAL (arg2); + offset = scale * fold_offsets (insn, arg1, info, single_use); } else return false; @@ -438,12 +463,8 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, if (REG_P (arg1) && CONST_INT_P (arg2)) { - if (do_recursion) - { - HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2)); - offset = scale * fold_offsets (insn, arg1, analyze, - foldable_insns); - } + HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2)); + offset = scale * fold_offsets (insn, arg1, info, single_use); } else return false; @@ -454,8 +475,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, case REG: { /* Propagate through register move. */ - if (do_recursion) - offset = fold_offsets (insn, src, analyze, foldable_insns); + offset = fold_offsets (insn, src, info, single_use); /* Pattern recognized for folding. */ break; @@ -464,8 +484,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, { offset = INTVAL (src); /* R1 = C is candidate for folding. */ - if (!analyze) - bitmap_set_bit (foldable_insns, INSN_UID (insn)); + fold_agnostic = false; /* Pattern recognized for folding. */ break; @@ -475,373 +494,395 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, return false; } - if (do_recursion && !analyze) + if (offset_out) *offset_out = offset; + if (fold_agnostic) + { + /* For single-USE mode we don't need to collect fold-agnostic INSN. */ + if (!single_use) + info->fold_agnostic_insns.safe_push (insn); + } + else + info->fold_insns.safe_push (insn); + return true; } -/* Function that computes the offset that would have to be added to all uses - of REG if the instructions marked in FOLDABLE_INSNS were to be eliminated. +/* Function that calculates the offset for INSN that would have to be added to + all its USEs of REG. Foldable INSNs are added to INFO->fold_insns and + fold-agnostic INSNs are added to INFO->fold_agnostic_insns. + It is possible that some INSNs are added to both lists. In this case the + INSN is a fold INSN. - If ANALYZE is true then mark in CAN_FOLD_INSNS which instructions - transitively only affect other instructions found in CAN_FOLD_INSNS. - If ANALYZE is false then compute the offset required for folding. */ + Returns the offset on success or 0 if the calculation fails. */ static HOST_WIDE_INT -fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns) +fold_offsets (insn_info *insn, rtx reg, fold_mem_info *info, + bool single_use = true) { - rtx_insn *def = get_single_def_in_bb (insn, reg); - - if (!def || RTX_FRAME_RELATED_P (def) || GET_CODE (PATTERN (def)) != SET) + /* We can only affect the values of GPR registers. */ + unsigned int regno = REGNO (reg); + if (fixed_regs[regno] + || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno)) return 0; - rtx dest = SET_DEST (PATTERN (def)); - - if (!REG_P (dest)) + /* Get the DEF for REG in INSN. */ + set_info *def = get_single_def_in_bb (insn, reg, single_use); + if (!def) return 0; - /* We can only affect the values of GPR registers. */ - unsigned int dest_regno = REGNO (dest); - if (fixed_regs[dest_regno] - || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno)) - return 0; + insn_info *def_insn = def->insn (); + rtx_insn *def_rtl = def_insn->rtl (); - if (analyze) + if (dump_file && (dump_flags & TDF_DETAILS)) { - /* Check if we know how to handle DEF. */ - if (!fold_offsets_1 (def, true, false, NULL, NULL)) - return 0; - - /* We only fold through instructions that are transitively used as - memory addresses and do not have other uses. Use the same logic - from offset calculation to visit instructions that can propagate - offsets and keep track of them in CAN_FOLD_INSNS. */ - bool success; - struct df_link *uses = get_uses (def, dest, &success), *ref_link; + fprintf (dump_file, "For INSN: "); + print_rtl_single (dump_file, insn->rtl ()); + fprintf (dump_file, "...found DEF: "); + print_rtl_single (dump_file, def_rtl); + } - if (!success) - return 0; + gcc_assert (REGNO (reg) == REGNO (SET_DEST (PATTERN (def_rtl)))); - for (ref_link = uses; ref_link; ref_link = ref_link->next) + /* Check if all USEs of DEF are safe. */ + if (!has_foldable_uses_p (def, reg)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) { - rtx_insn *use = DF_REF_INSN (ref_link->ref); - - if (DEBUG_INSN_P (use)) - continue; - - /* Punt if the use is anything more complicated than a set - (clobber, use, etc). */ - if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET) - return 0; - - /* This use affects instructions outside of CAN_FOLD_INSNS. */ - if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use))) - return 0; - - rtx use_set = PATTERN (use); - - /* Special case: A foldable memory store is not foldable if it - mentions DEST outside of the address calculation. */ - if (use_set && MEM_P (SET_DEST (use_set)) - && reg_mentioned_p (dest, SET_SRC (use_set))) - return 0; + fprintf (dump_file, "has_foldable_uses_p failed for: "); + print_rtl_single (dump_file, def_rtl); } + return 0; + } - bitmap_set_bit (&can_fold_insns, INSN_UID (def)); - + /* Check if we know how to handle DEF. */ + HOST_WIDE_INT offset; + if (!fold_offsets_1 (def_insn, &offset, info, single_use)) + { if (dump_file && (dump_flags & TDF_DETAILS)) { - fprintf (dump_file, "Instruction marked for propagation: "); - print_rtl_single (dump_file, def); + fprintf (dump_file, "fold_offsets_1 failed for: "); + print_rtl_single (dump_file, def_rtl); } + return 0; } - else + + if (dump_file && (dump_flags & TDF_DETAILS)) { - /* We cannot propagate through this instruction. */ - if (!bitmap_bit_p (&can_fold_insns, INSN_UID (def))) - return 0; + fprintf (dump_file, "Instruction marked for propagation: "); + print_rtl_single (dump_file, def_rtl); } - HOST_WIDE_INT offset = 0; - bool recognized = fold_offsets_1 (def, analyze, true, &offset, - foldable_insns); - - if (!recognized) - return 0; - return offset; } -/* Test if INSN is a memory load / store that can have an offset folded to it. - Return true iff INSN is such an instruction and return through MEM_OUT, - REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is - used as a base address and the offset accordingly. - All of the out pointers may be NULL in which case they will be ignored. */ -bool -get_fold_mem_root (rtx_insn *insn, rtx *mem_out, rtx *reg_out, - HOST_WIDE_INT *offset_out) +/* Test if we can fold the new offset into the given fold-mem-offset root insn. + Returns true iff folding would succeed or false otherwise. */ +static bool +do_check_validity (fold_mem_info *info) { - rtx set = single_set (insn); - rtx mem = NULL_RTX; - - if (set != NULL_RTX) - { - rtx src = SET_SRC (set); - rtx dest = SET_DEST (set); - - /* Don't fold when we have unspec / volatile. */ - if (GET_CODE (src) == UNSPEC - || GET_CODE (src) == UNSPEC_VOLATILE - || GET_CODE (dest) == UNSPEC - || GET_CODE (dest) == UNSPEC_VOLATILE) - return false; - - if (MEM_P (src)) - mem = src; - else if (MEM_P (dest)) - mem = dest; - else if ((GET_CODE (src) == SIGN_EXTEND - || GET_CODE (src) == ZERO_EXTEND) - && MEM_P (XEXP (src, 0))) - mem = XEXP (src, 0); - } - - if (mem == NULL_RTX) - return false; + rtx_insn *insn_rtl = info->insn->rtl (); + rtx mem = info->mem; + rtx reg = info->reg; + HOST_WIDE_INT new_offset = info->offset + info->added_offset; + /* Test if it is valid to change MEM's address offset to NEW_OFFSET. */ + int icode = INSN_CODE (insn_rtl); + INSN_CODE (insn_rtl) = -1; rtx mem_addr = XEXP (mem, 0); - rtx reg; - HOST_WIDE_INT offset; - - if (REG_P (mem_addr)) - { - reg = mem_addr; - offset = 0; - } - else if (GET_CODE (mem_addr) == PLUS - && REG_P (XEXP (mem_addr, 0)) - && CONST_INT_P (XEXP (mem_addr, 1))) - { - reg = XEXP (mem_addr, 0); - offset = INTVAL (XEXP (mem_addr, 1)); - } + machine_mode mode = GET_MODE (mem_addr); + if (new_offset != 0) + XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode)); else - return false; + XEXP (mem, 0) = reg; - if (mem_out) - *mem_out = mem; - if (reg_out) - *reg_out = reg; - if (offset_out) - *offset_out = offset; + bool illegal = insn_invalid_p (insn_rtl, false) + || !memory_address_addr_space_p (mode, XEXP (mem, 0), + MEM_ADDR_SPACE (mem)); - return true; + /* Restore the instruction. */ + XEXP (mem, 0) = mem_addr; + INSN_CODE (insn_rtl) = icode; + + return !illegal; } -/* If INSN is a root memory instruction then do a DFS traversal on its - definitions and find folding candidates. */ -static void -do_analysis (rtx_insn *insn) +/* Check if any of the the provided INSNs in INSN_LIST is not marked in the + given bitmap. Return true if at least one INSN is not the bitmap and + false otherwise. */ +static bool +insn_uses_not_in_bitmap (vec<insn_info *> *insn_list, bitmap bm) { - rtx reg; - if (!get_fold_mem_root (insn, NULL, ®, NULL)) - return; - - if (dump_file && (dump_flags & TDF_DETAILS)) + for (insn_info *insn : *insn_list) { - fprintf (dump_file, "Starting analysis from root: "); - print_rtl_single (dump_file, insn); + gcc_assert (insn->num_defs () == 1); + set_info *def = dyn_cast<set_info *>(insn->defs ()[0]); + for (use_info *use : def->nondebug_insn_uses ()) + { + if (!bitmap_bit_p (bm, use->insn ()->uid ())) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Cannot ensure correct transformation as " + "INSN %u has a USE INSN %u that was not analysed.\n", + insn->uid (), use->insn ()->uid ()); + } + + return true; + } + } } - /* Analyse folding opportunities for this memory instruction. */ - bitmap_set_bit (&can_fold_insns, INSN_UID (insn)); - fold_offsets (insn, reg, true, NULL); + return false; } +/* Check if all USEs of all instructions have been analysed. + If a fold_mem_info is found that has an unknown USE, then + drop it from the list. When this function returns, all + fold_mem_infos in the worklist reference instructions that + have been analysed before and can therefore be committed. */ static void -do_fold_info_calculation (rtx_insn *insn, fold_info_map *fold_info) +drop_unsafe_candidates (vec<fold_mem_info *> *worklist) { - rtx mem, reg; - HOST_WIDE_INT cur_offset; - if (!get_fold_mem_root (insn, &mem, ®, &cur_offset)) - return; + /* First mark all analysed INSNs in a bitmap. */ + auto_bitmap insn_closure; + for (fold_mem_info *info : worklist) + { + bitmap_set_bit (insn_closure, info->insn->uid ()); + for (insn_info *insn : info->fold_agnostic_insns) + bitmap_set_bit (insn_closure, insn->uid ()); + for (insn_info *insn : info->fold_insns) + bitmap_set_bit (insn_closure, insn->uid ()); + } - fold_mem_info *info = new fold_mem_info; - info->added_offset = fold_offsets (insn, reg, false, info->fold_insns); + /* Now check if all uses of fold_insns are marked. */ + unsigned i; + fold_mem_info *info; + FOR_EACH_VEC_ELT (*worklist, i, info) + { + if (insn_uses_not_in_bitmap (&info->fold_agnostic_insns, insn_closure) + || insn_uses_not_in_bitmap (&info->fold_insns, insn_closure)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Dropping fold-mem-offset root INSN %u.\n", + info->insn->uid ()); + } - fold_info->put (insn, info); + /* Drop INFO from worklist and start over. */ + info->fold_agnostic_insns.truncate (0); + info->fold_insns.truncate (0); + worklist->unordered_remove (i); + delete info; + drop_unsafe_candidates (worklist); + return; + } + } } -/* If INSN is a root memory instruction then compute a potentially new offset - for it and test if the resulting instruction is valid. */ +/* Fold the new offset into the fold-mem-info root. */ static void -do_check_validity (rtx_insn *insn, fold_mem_info *info) +do_commit_offset (fold_mem_info *info) { - rtx mem, reg; - HOST_WIDE_INT cur_offset; - if (!get_fold_mem_root (insn, &mem, ®, &cur_offset)) - return; + insn_info *insn = info->insn; + rtx_insn *insn_rtl = insn->rtl (); + rtx mem = info->mem; + rtx reg = info->reg; + HOST_WIDE_INT new_offset = info->offset + info->added_offset; - HOST_WIDE_INT new_offset = cur_offset + info->added_offset; + if (info->added_offset == 0) + return; - /* Test if it is valid to change MEM's address offset to NEW_OFFSET. */ - int icode = INSN_CODE (insn); - INSN_CODE (insn) = -1; - rtx mem_addr = XEXP (mem, 0); - machine_mode mode = GET_MODE (mem_addr); + machine_mode mode = GET_MODE (XEXP (mem, 0)); if (new_offset != 0) XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode)); else XEXP (mem, 0) = reg; + INSN_CODE (insn_rtl) = recog (PATTERN (insn_rtl), insn_rtl, 0); + df_insn_rescan (insn_rtl); - bool illegal = insn_invalid_p (insn, false) - || !memory_address_addr_space_p (mode, XEXP (mem, 0), - MEM_ADDR_SPACE (mem)); - - /* Restore the instruction. */ - XEXP (mem, 0) = mem_addr; - INSN_CODE (insn) = icode; - - if (illegal) - bitmap_ior_into (&cannot_fold_insns, info->fold_insns); - else - bitmap_ior_into (&candidate_fold_insns, info->fold_insns); + if (dump_file) + fprintf (dump_file, "INSN %u: Memory offset changed from " + HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC ".\n", + insn->uid (), info->offset, new_offset); } -static bool -compute_validity_closure (fold_info_map *fold_info) +/* Set the constant of the fold INSN to zero. */ +static void +do_commit_insn (insn_info *insn) { - /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C - and memory operations rN that use xN as shown below. If folding x1 in r1 - turns out to be invalid for whatever reason then it's also invalid to fold - any of the other xN into any rN. That means that we need the transitive - closure of validity to determine whether we can fold a xN instruction. - - +--------------+ +-------------------+ +-------------------+ - | r1 = mem[x1] | | r2 = mem[x1 + x2] | | r3 = mem[x2 + x3] | ... - +--------------+ +-------------------+ +-------------------+ - ^ ^ ^ ^ ^ - | / | / | ... - | / | / | - +-------------+ / +-------------+ / +-------------+ - | x1 = x1 + 1 |-----+ | x2 = x2 + 1 |-----+ | x3 = x3 + 1 |--- ... - +-------------+ +-------------+ +-------------+ - ^ ^ ^ - | | | - ... ... ... - */ - - /* In general three iterations should be enough for most cases, but allow up - to five when -fexpensive-optimizations is used. */ - int max_iters = 3 + 2 * flag_expensive_optimizations; - for (int pass = 0; pass < max_iters; pass++) + rtx_insn *insn_rtl = insn->rtl (); + + /* If we deleted this INSNs before, then nothing left to do here. */ + if (insn_rtl->deleted ()) + return; + + rtx set = single_set (insn_rtl); + rtx src = SET_SRC (set); + + /* Emit a move and let subsequent passes eliminate it if possible. */ + if (GET_CODE (src) == CONST_INT) { - bool made_changes = false; - for (fold_info_map::iterator iter = fold_info->begin (); - iter != fold_info->end (); ++iter) + /* Only change if necessary. */ + if (INTVAL (src)) { - fold_mem_info *info = (*iter).second; - if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns)) - made_changes |= bitmap_ior_into (&cannot_fold_insns, - info->fold_insns); + /* INSN is R1 = C. Set C to 0 because it was folded. */ + SET_SRC (set) = CONST0_RTX (GET_MODE (SET_SRC (set))); + df_insn_rescan (insn_rtl); + } + } + else + { + /* Only change if necessary. */ + if (INTVAL (XEXP (src, 1))) + { + /* INSN is R1 = R2 + C. Set C to 0 because it was folded. */ + XEXP (src, 1) = CONST0_RTX (GET_MODE (XEXP (src, 1))); + df_insn_rescan (insn_rtl); + + /* Mark self-assignments for deletion. */ + rtx dest = SET_DEST (set); + if (REGNO (dest) == REGNO (XEXP (src, 0))) + delete_insn (insn_rtl); } - - if (!made_changes) - return true; } - return false; + if (dump_file) + fprintf (dump_file, "INSN %u: Constant set to zero.\n", insn->uid ()); } -/* If INSN is a root memory instruction that was affected by any folding - then update its offset as necessary. */ -static void -do_commit_offset (rtx_insn *insn, fold_mem_info *info) +/* Fold memory offsets by analysing the DEF-USE chain where the DEFs + will only have a single USE. */ +static unsigned int +fold_mem_offsets_single_use () { - rtx mem, reg; - HOST_WIDE_INT cur_offset; - if (!get_fold_mem_root (insn, &mem, ®, &cur_offset)) - return; + unsigned int stats_fold_count = 0; - HOST_WIDE_INT new_offset = cur_offset + info->added_offset; + /* Iterate over all nondebug INSNs get our candidates and fold them. */ + for (auto insn : iterate_safely (crtl->ssa->nondebug_insns ())) + { + if (!insn->is_real () || !insn->can_be_optimized ()) + continue; - if (new_offset == cur_offset) - return; + rtx mem, reg; + HOST_WIDE_INT offset; + if (!get_fold_mem_offset_root (insn, &mem, ®, &offset)) + continue; - gcc_assert (!bitmap_empty_p (info->fold_insns)); + fold_mem_info info; + info.insn = insn; + info.mem = mem; + info.reg = reg; + info.offset = offset; - if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns)) - return; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Starting analysis from root: "); + print_rtl_single (dump_file, info.insn->rtl ()); + } - if (dump_file) - { - fprintf (dump_file, "Memory offset changed from " - HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC - " for instruction:\n", cur_offset, new_offset); - print_rtl_single (dump_file, insn); + /* Walk DEF-chain and collect info.fold_insns and + the resulting offset. */ + info.added_offset = fold_offsets (info.insn, info.reg, &info); + if (info.added_offset == 0 || !do_check_validity (&info)) + continue; + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Found root offset delta: " HOST_WIDE_INT_PRINT_DEC "\n", + info.added_offset); + + do_commit_offset (&info); + + while (!info.fold_insns.is_empty ()) + { + insn_info *insn = info.fold_insns.pop (); + do_commit_insn (insn); + stats_fold_count++; + } } - machine_mode mode = GET_MODE (XEXP (mem, 0)); - if (new_offset != 0) - XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode)); - else - XEXP (mem, 0) = reg; - INSN_CODE (insn) = recog (PATTERN (insn), insn, 0); - df_insn_rescan (insn); + return stats_fold_count; } -/* If INSN is a move / add instruction that was folded then replace its - constant part with zero. */ -static void -do_commit_insn (rtx_insn *insn) +/* Fold memory offsets by analysing the DEF-USE chain where the DEFs + can have multiple USE. */ +static unsigned int +fold_mem_offsets_multi_use () { - if (bitmap_bit_p (&candidate_fold_insns, INSN_UID (insn)) - && !bitmap_bit_p (&cannot_fold_insns, INSN_UID (insn))) + unsigned int stats_fold_count = 0; + + /* Iterate over all nondebug INSNs and get our candidates. */ + auto_vec<fold_mem_info *> worklist; + for (auto insn : iterate_safely (crtl->ssa->nondebug_insns ())) { - if (dump_file) - { - fprintf (dump_file, "Instruction folded:"); - print_rtl_single (dump_file, insn); - } + if (!insn->is_real () || !insn->can_be_optimized ()) + continue; - stats_fold_count++; + rtx mem, reg; + HOST_WIDE_INT offset; + if (!get_fold_mem_offset_root (insn, &mem, ®, &offset)) + continue; - rtx set = single_set (insn); - rtx dest = SET_DEST (set); - rtx src = SET_SRC (set); + fold_mem_info *info = new fold_mem_info; + info->insn = insn; + info->mem = mem; + info->reg = reg; + info->offset = offset; - /* Emit a move and let subsequent passes eliminate it if possible. */ - if (GET_CODE (src) == CONST_INT) + if (dump_file && (dump_flags & TDF_DETAILS)) { - /* INSN is R1 = C. - Replace it with R1 = 0 because C was folded. */ - rtx mov_rtx - = gen_move_insn (dest, gen_int_mode (0, GET_MODE (dest))); - df_insn_rescan (emit_insn_after (mov_rtx, insn)); + fprintf (dump_file, "Starting analysis from root: "); + print_rtl_single (dump_file, info->insn->rtl ()); } - else + + /* Walk DEF-chain and collect info->fold_insns, info->fold_agnostic_insns + and the resulting offset. */ + info->added_offset = fold_offsets (info->insn, info->reg, info, false); + if (info->added_offset == 0 || !do_check_validity (info)) { - /* INSN is R1 = R2 + C. - Replace it with R1 = R2 because C was folded. */ - rtx arg1 = XEXP (src, 0); + delete info; + continue; + } - /* If the DEST == ARG1 then the move is a no-op. */ - if (REGNO (dest) != REGNO (arg1)) - { - gcc_checking_assert (GET_MODE (dest) == GET_MODE (arg1)); - rtx mov_rtx = gen_move_insn (dest, arg1); - df_insn_rescan (emit_insn_after (mov_rtx, insn)); - } + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Found root offset delta: " HOST_WIDE_INT_PRINT_DEC "\n", + info->added_offset); + + /* Append candidate. */ + worklist.safe_push (info); + } + + /* Now drop all fold_mem_infos, which contain INSNs that have unknown + USEs and are therefore not safe to change. */ + drop_unsafe_candidates (&worklist); + + while (!worklist.is_empty ()) + { + fold_mem_info *info = worklist.pop (); + do_commit_offset (info); + + while (!info->fold_insns.is_empty ()) + { + insn_info *insn = info->fold_insns.pop (); + do_commit_insn (insn); + stats_fold_count++; } - /* Delete the original move / add instruction. */ - delete_insn (insn); + info->fold_agnostic_insns.truncate (0); + delete info; } + + return stats_fold_count; } -unsigned int -pass_fold_mem_offsets::execute (function *fn) +/* Main function of fold-mem-offsets pass. */ +static unsigned int +fold_mem_offsets (function *fn) { + bool multi_use_mode = true; + /* Computing UD/DU chains for flow graphs which have a high connectivity will take a long time and is unlikely to be particularly useful. @@ -856,69 +897,82 @@ pass_fold_mem_offsets::execute (function *fn) "fold-mem-offsets: %d basic blocks and %d edges/basic block", n_basic_blocks_for_fn (cfun), n_edges_for_fn (cfun) / n_basic_blocks_for_fn (cfun)); - return 0; + multi_use_mode = false; } - df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + DF_DEFER_INSN_RESCAN); - df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN); + /* There is a conflict between this pass and RISCV's shorten-memrefs + pass. For now disable folding if optimizing for size because + otherwise this cancels the effects of shorten-memrefs. */ + cgraph_node *n = cgraph_node::get (fn->decl); + if (n && n->optimize_for_size_p ()) + return 0; + + /* Initialise RTL SSA. */ + calculate_dominance_info (CDI_DOMINATORS); df_analyze (); + crtl->ssa = new rtl_ssa::function_info (cfun); - bitmap_initialize (&can_fold_insns, NULL); - bitmap_initialize (&candidate_fold_insns, NULL); - bitmap_initialize (&cannot_fold_insns, NULL); + /* The number of instructions that were simplified or eliminated. */ + int stats_fold_count = 0; - stats_fold_count = 0; + /* Fold mem offsets with DEFs that have a single USE. */ + stats_fold_count += fold_mem_offsets_single_use (); - basic_block bb; - rtx_insn *insn; - FOR_ALL_BB_FN (bb, fn) + /* Fold mem offsets with DEFs that have multiple USEs. */ + if (multi_use_mode || flag_expensive_optimizations) { - /* There is a conflict between this pass and RISCV's shorten-memrefs - pass. For now disable folding if optimizing for size because - otherwise this cancels the effects of shorten-memrefs. */ - if (optimize_bb_for_size_p (bb)) - continue; - - fold_info_map fold_info; + if (dump_file) + fprintf (dump_file, "Starting multi-use analysis\n"); + stats_fold_count += fold_mem_offsets_multi_use (); + } - bitmap_clear (&can_fold_insns); - bitmap_clear (&candidate_fold_insns); - bitmap_clear (&cannot_fold_insns); + statistics_counter_event (cfun, "Number of folded instructions", + stats_fold_count); - FOR_BB_INSNS (bb, insn) - do_analysis (insn); + free_dominance_info (CDI_DOMINATORS); + cleanup_cfg (0); - FOR_BB_INSNS (bb, insn) - do_fold_info_calculation (insn, &fold_info); + delete crtl->ssa; + crtl->ssa = nullptr; - FOR_BB_INSNS (bb, insn) - if (fold_mem_info **info = fold_info.get (insn)) - do_check_validity (insn, *info); + delete_trivially_dead_insns (get_insns (), max_reg_num ()); - if (compute_validity_closure (&fold_info)) - { - FOR_BB_INSNS (bb, insn) - if (fold_mem_info **info = fold_info.get (insn)) - do_commit_offset (insn, *info); + return 0; +} - FOR_BB_INSNS (bb, insn) - do_commit_insn (insn); - } +namespace { - for (fold_info_map::iterator iter = fold_info.begin (); - iter != fold_info.end (); ++iter) - delete (*iter).second; - } +const pass_data pass_data_fold_mem = +{ + RTL_PASS, /* type */ + "fold_mem_offsets", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_FOLD_MEM_OFFSETS, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_df_finish, /* todo_flags_finish */ +}; - statistics_counter_event (cfun, "Number of folded instructions", - stats_fold_count); +class pass_fold_mem_offsets : public rtl_opt_pass +{ +public: + pass_fold_mem_offsets (gcc::context *ctxt) + : rtl_opt_pass (pass_data_fold_mem, ctxt) + {} - bitmap_release (&can_fold_insns); - bitmap_release (&candidate_fold_insns); - bitmap_release (&cannot_fold_insns); + /* opt_pass methods: */ + bool gate (function *) final override + { + return flag_fold_mem_offsets && optimize >= 2; + } - return 0; -} + unsigned int execute (function *fn) final override + { + return fold_mem_offsets (fn); + } +}; // class pass_fold_mem_offsets } // anon namespace -- 2.49.0