This patch converts the fold-mem-offsets pass from DF to RTL-SSA.
Along with this conversion, the way the pass collects information
was completely reworked.  Instead of visiting each instruction multiple
times, this is now down only once.

Most significant changes are:
* The pass operates mainly on insn_info objects from RTL-SSA.
* Single iteration over all nondebug INSNs for identification
  of fold-mem-roots.  Then walk of the fold-mem-roots' DEF-chain
  to collect foldable constants.
* The class fold_mem_info holds vectors for the DEF-chain of
  the to-be-folded INSNs (fold_agnostic_insns, which don't need
  to be adjusted, and fold_insns, which need their constant to
  be set to zero).
* Introduction of a single-USE mode, which only collects DEFs,
  that have a single USE and therefore are safe to transform
  (the fold-mem-root will be the final USE).  This mode is fast
  and will always run (unless disabled via -fno-fold-mem-offsets).
* Introduction of a multi-USE mode, which allows DEFs to have
  multiple USEs, but all USEs must be part of any fold-mem-root's
  DEF-chain.  The analysis of all USEs is expensive and therefore,
  this mode is disabled for highly connected CFGs.  Note, that
  multi-USE mode will miss some opportunities that the single-USE
  mode finds (e.g. multi-USE mode fails for fold-mem-offsets-3.c).

The following testing was done:
* Bootstrapped and regtested on aarch64-linux and x86-64-linux.
* Regtested on riscv64-linux.
* SPEC CPU 2017 tested on aarch64 and riscv64-linux.

The number of applied optimizations of different versions/modes
of fold-mem-offsets in SPEC CPU2017 on RISC-V rv64gc_zba_zbb_zbs
is as follows:

Upstream:
  500.perlbench_r: 169
  502.gcc_r: 624
  520.omnetpp_r: 1301
  523.xalancbmk_r: 23
  525.x264_r: 705
  531.deepsjeng_r: 36
  541.leela_r: 19
  548.exchange2: 90
  557.xz_r: 11
  SUM: 2151

New single-USE:
  500.perlbench_r: 70
  502.gcc_r: 263
  520.omnetpp_r: 1100
  523.xalancbmk_r: 10
  525.x264_r: 95
  531.deepsjeng_r: 19
  541.leela_r: 252
  548.exchange2: 13
  557.xz_r: 11
  SUM: 1833

New multi-USE:
  500.perlbench_r: 186
  502.gcc_r: 744
  520.omnetpp_r: 1187
  523.xalancbmk_r: 22
  525.x264_r: 985
  531.deepsjeng_r: 21
  541.leela_r: 87
  548.exchange2: 63
  557.xz_r: 23
  SUM: 3318

New single-USE then multi-USE:
  500.perlbench_r: 192
  502.gcc_r: 761
  520.omnetpp_r: 1673
  523.xalancbmk_r: 22
  525.x264_r: 995
  531.deepsjeng_r: 21
  541.leela_r: 252
  548.exchange2: 63
  557.xz_r: 23
  SUM: 4002

A compile time analysis with `/bin/time -v ./install/usr/local/bin/gcc -O2 
all.i`
(all.i from PR117922) shows:
* -fno-fold-mem-offsets:  289 s (user time) / 15572436 kBytes (max resident set 
size)
* -ffold-mem-offsets:     339 s (user time) / 23606516 kBytes (max resident set 
size)
Adding -fexpensive-optimizations to enable multi-USE mode does not have
an impact on the duration or the memory footprint.

SPEC CPU 2017 showed no significant performance impact on aarch64-linux.

gcc/ChangeLog:

        PR rtl-optimization/117922
        * fold-mem-offsets.cc (INCLUDE_ALGORITHM): Added definition.
        (INCLUDE_FUNCTIONAL): Likewise.
        (INCLUDE_ARRAY): Likewise.
        (class pass_fold_mem_offsets): Moved to bottom of file.
        (get_fold_mem_offset_root): Converted to RTL-SSA.
        (get_single_def_in_bb): Converted to RTL-SSA.
        (get_uses): New.
        (has_foldable_uses_p): Converted to RTL-SSA.
        (fold_offsets): Converted to RTL-SSA.
        (fold_offsets_1): Converted to RTL-SSA.
        (get_fold_mem_root): Removed.
        (do_check_validity): New.
        (do_analysis): Removed.
        (insn_uses_not_in_bitmap): New.
        (do_fold_info_calculation): Removed.
        (drop_unsafe_candidates): New.
        (do_commit_offset): Converted to RTL-SSA.
        (compute_validity_closure): Removed.
        (do_commit_insn): Changed to change INSN in place.
        (fold_mem_offsets_single_use): New.
        (fold_mem_offsets_multi_use): New.
        (pass_fold_mem_offsets::execute): Moved to bottom of file.
        (fold_mem_offsets): New.

Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu>
---
 gcc/fold-mem-offsets.cc | 1138 ++++++++++++++++++++-------------------
 1 file changed, 596 insertions(+), 542 deletions(-)

diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
index c1c94472a071..0e777c32ee31 100644
--- a/gcc/fold-mem-offsets.cc
+++ b/gcc/fold-mem-offsets.cc
@@ -17,24 +17,34 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
-#include "tm.h"
+#include "backend.h"
 #include "rtl.h"
+#include "rtlanal.h"
+#include "df.h"
+#include "rtl-ssa.h"
+
+#include "predict.h"
+#include "cfgrtl.h"
+#include "cfgcleanup.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target.h"
+
+#include "tm.h"
 #include "tree.h"
 #include "expr.h"
 #include "backend.h"
 #include "regs.h"
-#include "target.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
 #include "insn-config.h"
 #include "recog.h"
-#include "predict.h"
-#include "df.h"
-#include "tree-pass.h"
-#include "cfgrtl.h"
 #include "diagnostic-core.h"
 
 /* This pass tries to optimize memory offset calculations by moving constants
@@ -69,214 +79,248 @@ along with GCC; see the file COPYING3.  If not see
       allocated on the stack can result in unwanted add instructions that
       cannot be eliminated easily.
 
-   This pass works on a basic block level and consists of 4 phases:
-
-    - Phase 1 (Analysis): Find "foldable" instructions.
-      Foldable instructions are those that we know how to propagate
-      a constant addition through (add, shift, move, ...) and only have other
-      foldable instructions for uses.  In that phase a DFS traversal on the
-      definition tree is performed and foldable instructions are marked on
-      a bitmap.  The add immediate instructions that are reachable in this
-      DFS are candidates for folding since all the intermediate calculations
-      affected by them are also foldable.
-
-    - Phase 2 (Validity): Traverse and calculate the offsets that would result
-      from folding the add immediate instructions.  Check whether the
-      calculated offsets result in a valid instruction for the target.
-
-    - Phase 3 (Commit offsets): Traverse again.  It is now known which folds
-      are valid so at this point change the offsets in the memory instructions.
-
-    - Phase 4 (Commit instruction deletions): Scan all instructions and delete
-      or simplify (reduce to move) all add immediate instructions that were
-      folded.
+   The pass differentiates between the following instructions:
+
+   - fold-mem-offset root insn: loads/stores where constants will be folded 
into
+     the address offset.  E.g.:
+       (set (mem:DI (plus:DI (reg:DI sp) (const_int 40))) (reg:DI ra))
+   - fold-agnostic insns: instructions that may have an impact on the offset
+     calculation, but that don't require any fixup when folding.  E.g.:
+       (set (reg:DI a0) (ashift:DI (reg:DI s1) (const_int 1)))
+   - fold insns: instructions that provide constants, which will be forwarded
+     into the loads/stores as offset.  When folding, the constants will be
+     set to zero.  E.g.:
+       (set (reg:DI s0) (plus:DI (reg:DI sp) (const_int 8)))
+
+   The pass utilizes the RTL SSA framework to get the data dependencies
+   and operates in the following phases:
+
+   - Phase 1: Iterate over all instructions to identify fold-mem-offset roots.
+   - Phase 2: Walk back along the def-chain of fold-agnostic or fold insns.
+             When successful a new offset of the fold-mem-offset is calculated
+             and a vec of fold insns that need adjustments is created.
+   - Phase 3: Drop all fold-mem-offset roots that won't accept the updated
+             offset.
+   - Phase 4: Ensure that the defs of all fold insns are used only by
+             fold-mem-offsets insns (only needed if DEFs with multiple USESs
+             are enabled via -fexpensive-optimizations).
+   - Phase 5: Update all fold-mem-offset roots and adjust the fold insns.
+
+   When we walk the DEF-chain we have two choices of operations:
+
+   - single-USE mode: Only DEFs that have exactly one USE (in the instruction
+     that we come from) are allowed. This greatly simplify the problem, but
+     also misses some cases.
+   - multi-USE mode: DEFs are allowed to have multiple USEs.  E.g. a single
+     ADDI may define a value that is used by two LOADs.  In this case, we need
+     to ensure that all USE-chains remain correct after we apply the folding.
+     This is done by allowing only USEs that are part of any other
+     fold-mem-offset chain in phase 4 above.
 
    This pass should run before hard register propagation because it creates
    register moves that we expect to be eliminated.  */
 
-namespace {
-
-const pass_data pass_data_fold_mem =
-{
-  RTL_PASS, /* type */
-  "fold_mem_offsets", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
-  TV_FOLD_MEM_OFFSETS, /* tv_id */
-  0, /* properties_required */
-  0, /* properties_provided */
-  0, /* properties_destroyed */
-  0, /* todo_flags_start */
-  TODO_df_finish, /* todo_flags_finish */
-};
-
-class pass_fold_mem_offsets : public rtl_opt_pass
-{
-public:
-  pass_fold_mem_offsets (gcc::context *ctxt)
-    : rtl_opt_pass (pass_data_fold_mem, ctxt)
-  {}
-
-  /* opt_pass methods: */
-  virtual bool gate (function *)
-    {
-      return flag_fold_mem_offsets && optimize >= 2;
-    }
-
-  virtual unsigned int execute (function *);
-}; // class pass_fold_mem_offsets
+using namespace rtl_ssa;
 
 /* Class that holds in FOLD_INSNS the instructions that if folded the offset
    of a memory instruction would increase by ADDED_OFFSET.  */
 class fold_mem_info {
 public:
-  auto_bitmap fold_insns;
+  /* fold-mem-offset root details */
+  insn_info *insn;
+  rtx mem;
+  rtx reg;
+  HOST_WIDE_INT offset;
+
+  /* Offset that can be folded into the fold-mem-offset root.  */
   HOST_WIDE_INT added_offset;
+
+  /* DEF-chain INSNs.  */
+  auto_vec<insn_info *> fold_agnostic_insns;
+  auto_vec<insn_info *> fold_insns;
 };
 
-typedef hash_map<rtx_insn *, fold_mem_info *> fold_info_map;
+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an instruction and return through MEM,
+   REG and OFFSET the RTX that has a MEM code, the register that is
+   used as a base address and the offset accordingly.  */
+bool
+get_fold_mem_offset_root (insn_info *insn, rtx *mem, rtx *reg,
+                         HOST_WIDE_INT *offset)
+{
+  rtx set = single_set (insn->rtl ());
+  if (set != NULL_RTX)
+    {
+      rtx src = SET_SRC (set);
+      rtx dest = SET_DEST (set);
 
-/* Tracks which instructions can be reached through instructions that can
-   propagate offsets for folding.  */
-static bitmap_head can_fold_insns;
+      /* Don't fold when we have unspec / volatile.  */
+      if (GET_CODE (src) == UNSPEC
+         || GET_CODE (src) == UNSPEC_VOLATILE
+         || GET_CODE (dest) == UNSPEC
+         || GET_CODE (dest) == UNSPEC_VOLATILE)
+       return false;
 
-/* Marks instructions that are currently eligible for folding.  */
-static bitmap_head candidate_fold_insns;
+      if (MEM_P (src))
+       *mem = src;
+      else if (MEM_P (dest))
+       *mem = dest;
+      else if ((GET_CODE (src) == SIGN_EXTEND
+               || GET_CODE (src) == ZERO_EXTEND)
+              && MEM_P (XEXP (src, 0)))
+       *mem = XEXP (src, 0);
+      else
+       return false;
+    }
+  else
+    return false;
 
-/* Tracks instructions that cannot be folded because it turned out that
-   folding will result in creating an invalid memory instruction.
-   An instruction can be in both CANDIDATE_FOLD_INSNS and CANNOT_FOLD_INSNS
-   at the same time, in which case it is not legal to fold.  */
-static bitmap_head cannot_fold_insns;
+  rtx mem_addr = XEXP (*mem, 0);
+  if (REG_P (mem_addr))
+    {
+      *reg = mem_addr;
+      *offset = 0;
+    }
+  else if (GET_CODE (mem_addr) == PLUS
+          && REG_P (XEXP (mem_addr, 0))
+          && CONST_INT_P (XEXP (mem_addr, 1)))
+    {
+      *reg = XEXP (mem_addr, 0);
+      *offset = INTVAL (XEXP (mem_addr, 1));
+    }
+  else
+    return false;
 
-/* The number of instructions that were simplified or eliminated.  */
-static int stats_fold_count;
+  return true;
+}
 
-/* Get the single reaching definition of an instruction inside a BB.
-   The definition is desired for REG used in INSN.
-   Return the definition insn or NULL if there's no definition with
-   the desired criteria.  */
-static rtx_insn *
-get_single_def_in_bb (rtx_insn *insn, rtx reg)
+/* Get the single reaching definition of REG used in INSN within a BB.
+   Return the definition or NULL if there's no definition with the desired
+   criteria.  If SINGLE_USE is set to true the DEF must have exactly one
+   USE resulting in a 1:1 DEF-USE relationship.  If set to false, then a
+   1:n DEF-USE relationship is accepted and the caller must take care to
+   ensure all USEs are safe for folding.  */
+static set_info *
+get_single_def_in_bb (insn_info *insn, rtx reg, bool single_use)
 {
-  df_ref use;
-  struct df_link *ref_chain, *ref_link;
-
-  FOR_EACH_INSN_USE (use, insn)
+  /* Get the use_info of the base register.  */
+  for (use_info *use : insn->uses ())
     {
-      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+      /* Other USEs can be ignored and multiple equal USEs are fine.  */
+      if (use->regno () != REGNO (reg))
+       continue;
+
+      /* Don't handle subregs for now.  */
+      if (use->includes_subregs ())
        return NULL;
-      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
-       break;
-    }
 
-  if (!use)
-    return NULL;
+      /* Get the DEF of the register.  */
+      set_info *def = use->def ();
+      if (!def)
+       return NULL;
 
-  ref_chain = DF_REF_CHAIN (use);
+      /* Limit the amount of USEs of DEF to 1.  */
+      auto iter = def->nondebug_insn_uses ();
+      if (single_use && ++iter.begin () != iter.end ())
+       return NULL;
 
-  if (!ref_chain)
-    return NULL;
+      /* Don't handle multiregs for now.  */
+      if (def->includes_multiregs ())
+       return NULL;
 
-  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
-    {
-      /* Problem getting some definition for this instruction.  */
-      if (ref_link->ref == NULL)
+      /* Only consider uses whose definition comes from a real instruction
+        and has no notes attached.  */
+      insn_info *def_insn = def->insn ();
+      if (def_insn->is_artificial () || def_insn->first_note ())
        return NULL;
-      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
+
+      /* No parallel expressions or clobbers.  */
+      if (def_insn->num_defs () != 1)
        return NULL;
-      if (global_regs[REGNO (reg)]
-         && !set_of (reg, DF_REF_INSN (ref_link->ref)))
+
+      rtx_insn *def_rtl = def_insn->rtl ();
+      if (!NONJUMP_INSN_P (def_rtl) || RTX_FRAME_RELATED_P (def_rtl))
        return NULL;
-    }
 
-  if (ref_chain->next)
-    return NULL;
+      /* Check if the DEF is a SET of the expected form.  */
+      rtx def_set = simple_regno_set (PATTERN (def_rtl), def->regno ());
+      if (!def_set)
+       return NULL;
 
-  rtx_insn *def = DF_REF_INSN (ref_chain->ref);
+      /* Ensure DEF and USE are in the same BB.  */
+      if (def->bb () != insn->bb ())
+       return NULL;
 
-  if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
-    return NULL;
+      /* Ensure the definition comes before the insn.  */
+      if (*use->insn () < *def_insn)
+       return NULL;
 
-  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
-    return NULL;
+      return def;
+    }
 
-  return def;
+  return NULL;
 }
 
-/* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
-   use is missing / irregular.  If SUCCESS is not NULL then set it to false if
-   there are missing / irregular uses and true otherwise.  */
-static df_link *
-get_uses (rtx_insn *insn, rtx reg, bool *success)
+/* Test if all USEs of DEF (which defines REG) meet certain criteria to be
+   foldable.  Returns true iff all USEs are fine or false otherwise.  */
+static bool
+has_foldable_uses_p (set_info *def, rtx reg)
 {
-  df_ref def;
-
-  if (success)
-    *success = false;
-
-  FOR_EACH_INSN_DEF (def, insn)
-    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
-      break;
-
-  if (!def)
-    return NULL;
+  /* We only fold through instructions that are transitively used as
+     memory addresses and do not have other uses.  Use the same logic
+     from offset calculation to visit instructions that can propagate
+     offsets and keep track of them in CAN_FOLD_INSNS.  */
+  for (use_info *use : def->nondebug_insn_uses ())
+    {
+      insn_info *use_insn = use->insn ();
+      if (use_insn->is_artificial ())
+       return false;
 
-  df_link *ref_chain = DF_REF_CHAIN (def);
-  int insn_luid = DF_INSN_LUID (insn);
-  basic_block insn_bb = BLOCK_FOR_INSN (insn);
+      /* Punt if the use is anything more complicated than a set
+        (clobber, use, etc).  */
+      rtx_insn *use_rtl = use_insn->rtl ();
+      if (!NONJUMP_INSN_P (use_rtl) || GET_CODE (PATTERN (use_rtl)) != SET)
+       return false;
 
-  for (df_link *ref_link = ref_chain; ref_link; ref_link = ref_link->next)
-    {
-      /* Problem getting a use for this instruction.  */
-      if (ref_link->ref == NULL)
-       return NULL;
-      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
-       return NULL;
+      /* Special case: A foldable memory store is not foldable if it
+        mentions DEST outside of the address calculation.  */
+      rtx use_set = PATTERN (use_rtl);
+      if (use_set && MEM_P (SET_DEST (use_set))
+         && reg_mentioned_p (reg, SET_SRC (use_set)))
+       return false;
 
-      rtx_insn *use = DF_REF_INSN (ref_link->ref);
-      if (DEBUG_INSN_P (use))
-       continue;
+      if (use->bb () != def->bb ())
+       return false;
 
-      /* We do not handle REG_EQUIV/REG_EQ notes for now.  */
-      if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
-       return NULL;
-      if (BLOCK_FOR_INSN (use) != insn_bb)
-       return NULL;
       /* Punt if use appears before def in the basic block.  See PR111601.  */
-      if (DF_INSN_LUID (use) < insn_luid)
-       return NULL;
+      if (*use_insn < *def->insn ())
+       return false;
     }
 
-  if (success)
-    *success = true;
-
-  return ref_chain;
+  return true;
 }
 
 static HOST_WIDE_INT
-fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns);
-
-/*  Helper function for fold_offsets.
+fold_offsets (insn_info *insn, rtx reg, fold_mem_info *info, bool single_use);
 
-    If DO_RECURSION is false and ANALYZE is true this function returns true iff
-    it understands the structure of INSN and knows how to propagate constants
-    through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
+/* Helper function for fold_offsets () that analyses the given INSN.
 
-    If DO_RECURSION is true then it also calls fold_offsets for each recognized
-    part of INSN with the appropriate arguments.
+   For INSN with known pattern, we calculate the value of the propagated
+   constant and store that in OFFSET_OUT.  Foldable INSNs are added to
+   INFO->fold_insns and fold-agnostic INSNs are added to
+   INFO->fold_agnostic_insns.  It is possible that some INSNs are added to
+   both lists.  In this case the INSN is a fold INSN.
 
-    If DO_RECURSION is true and ANALYZE is false then offset that would result
-    from folding is computed and is returned through the pointer OFFSET_OUT.
-    The instructions that can be folded are recorded in FOLDABLE_INSNS.  */
+   Returns true iff the analysis was successful and false otherwise.  */
 static bool
-fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
-               HOST_WIDE_INT *offset_out, bitmap foldable_insns)
+fold_offsets_1 (insn_info *insn, HOST_WIDE_INT *offset_out, fold_mem_info 
*info,
+               bool single_use)
 {
-  /* Doesn't make sense if both DO_RECURSION and ANALYZE are false.  */
-  gcc_checking_assert (do_recursion || analyze);
-  gcc_checking_assert (GET_CODE (PATTERN (insn)) == SET);
+  bool fold_agnostic = true;
+  rtx_insn *insn_rtl = insn->rtl ();
+  gcc_checking_assert (GET_CODE (PATTERN (insn_rtl)) == SET);
 
-  rtx src = SET_SRC (PATTERN (insn));
+  rtx src = SET_SRC (PATTERN (insn_rtl));
   HOST_WIDE_INT offset = 0;
 
   switch (GET_CODE (src))
@@ -288,35 +332,31 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
        rtx arg2 = XEXP (src, 1);
 
        if (REG_P (arg1))
-         {
-           if (do_recursion)
-             offset += fold_offsets (insn, arg1, analyze, foldable_insns);
-         }
+         offset += fold_offsets (insn, arg1, info, single_use);
        else if (GET_CODE (arg1) == ASHIFT
                 && REG_P (XEXP (arg1, 0))
                 && CONST_INT_P (XEXP (arg1, 1)))
          {
            /* Handle R1 = (R2 << C) + ...  */
-           if (do_recursion)
-             {
-               HOST_WIDE_INT scale
-                 = (HOST_WIDE_INT_1U << INTVAL (XEXP (arg1, 1)));
-               offset += scale * fold_offsets (insn, XEXP (arg1, 0), analyze,
-                                               foldable_insns);
-             }
+           rtx reg = XEXP (arg1, 0);
+           rtx shamt = XEXP (arg1, 1);
+           HOST_WIDE_INT scale = HOST_WIDE_INT_1U << INTVAL (shamt);
+           offset += scale * fold_offsets (insn, reg, info, single_use);
          }
        else if (GET_CODE (arg1) == PLUS
                 && REG_P (XEXP (arg1, 0))
                 && REG_P (XEXP (arg1, 1)))
          {
            /* Handle R1 = (R2 + R3) + ...  */
-           if (do_recursion)
+           rtx reg1 = XEXP (arg1, 0);
+           rtx reg2 = XEXP (arg1, 1);
+           if (REGNO (reg1) != REGNO (reg2))
              {
-               offset += fold_offsets (insn, XEXP (arg1, 0), analyze,
-                                       foldable_insns);
-               offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
-                                       foldable_insns);
+               offset += fold_offsets (insn, reg1, info, single_use);
+               offset += fold_offsets (insn, reg2, info, single_use);
              }
+           else
+             offset += 2 * fold_offsets (insn, reg1, info, single_use);
          }
        else if (GET_CODE (arg1) == PLUS
                 && GET_CODE (XEXP (arg1, 0)) == ASHIFT
@@ -325,32 +365,31 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
                 && REG_P (XEXP (arg1, 1)))
          {
            /* Handle R1 = ((R2 << C) + R3) + ...  */
-           if (do_recursion)
+           rtx reg1 = XEXP (XEXP (arg1, 0), 0);
+           rtx shamt = XEXP (XEXP (arg1, 0), 1);
+           rtx reg2 = XEXP (arg1, 1);
+           HOST_WIDE_INT scale = HOST_WIDE_INT_1U << INTVAL (shamt);
+           if (REGNO (reg1) != REGNO (reg2))
              {
-               HOST_WIDE_INT scale
-                 = (HOST_WIDE_INT_1U << INTVAL (XEXP (XEXP (arg1, 0), 1)));
-               offset += scale * fold_offsets (insn, XEXP (XEXP (arg1, 0), 0),
-                                               analyze, foldable_insns);
-               offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
-                                       foldable_insns);
+               offset += scale * fold_offsets (insn, reg1, info, single_use);
+               offset += fold_offsets (insn, reg2, info, single_use);
              }
+           else
+             offset += (scale + 1) * fold_offsets (insn, reg1, info,
+                                                   single_use);
          }
        else
          return false;
 
        if (REG_P (arg2))
-         {
-           if (do_recursion)
-             offset += fold_offsets (insn, arg2, analyze, foldable_insns);
-         }
+         offset += fold_offsets (insn, arg2, info, single_use);
        else if (CONST_INT_P (arg2))
          {
            if (REG_P (arg1))
              {
                offset += INTVAL (arg2);
                /* This is a R1 = R2 + C instruction, candidate for folding.  */
-               if (!analyze)
-                 bitmap_set_bit (foldable_insns, INSN_UID (insn));
+               fold_agnostic = false;
              }
          }
        else
@@ -366,26 +405,19 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
        rtx arg2 = XEXP (src, 1);
 
        if (REG_P (arg1))
-         {
-           if (do_recursion)
-             offset += fold_offsets (insn, arg1, analyze, foldable_insns);
-         }
+         offset += fold_offsets (insn, arg1, info, single_use);
        else
          return false;
 
        if (REG_P (arg2))
-         {
-           if (do_recursion)
-             offset -= fold_offsets (insn, arg2, analyze, foldable_insns);
-         }
+         offset -= fold_offsets (insn, arg2, info, single_use);
        else if (CONST_INT_P (arg2))
          {
            if (REG_P (arg1))
              {
                offset -= INTVAL (arg2);
                /* This is a R1 = R2 - C instruction, candidate for folding.  */
-               if (!analyze)
-                 bitmap_set_bit (foldable_insns, INSN_UID (insn));
+               fold_agnostic = false;
              }
          }
        else
@@ -399,10 +431,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
        /* Propagate through negation.  */
        rtx arg1 = XEXP (src, 0);
        if (REG_P (arg1))
-         {
-           if (do_recursion)
-             offset = -fold_offsets (insn, arg1, analyze, foldable_insns);
-         }
+         offset = -fold_offsets (insn, arg1, info, single_use);
        else
          return false;
 
@@ -417,12 +446,8 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
 
        if (REG_P (arg1) && CONST_INT_P (arg2))
          {
-           if (do_recursion)
-             {
-               HOST_WIDE_INT scale = INTVAL (arg2);
-               offset = scale * fold_offsets (insn, arg1, analyze,
-                                              foldable_insns);
-             }
+           HOST_WIDE_INT scale = INTVAL (arg2);
+           offset = scale * fold_offsets (insn, arg1, info, single_use);
          }
        else
          return false;
@@ -438,12 +463,8 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
 
        if (REG_P (arg1) && CONST_INT_P (arg2))
          {
-           if (do_recursion)
-             {
-               HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2));
-               offset = scale * fold_offsets (insn, arg1, analyze,
-                                              foldable_insns);
-             }
+           HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2));
+           offset = scale * fold_offsets (insn, arg1, info, single_use);
          }
        else
          return false;
@@ -454,8 +475,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
     case REG:
       {
        /* Propagate through register move.  */
-       if (do_recursion)
-         offset = fold_offsets (insn, src, analyze, foldable_insns);
+       offset = fold_offsets (insn, src, info, single_use);
 
        /* Pattern recognized for folding.  */
        break;
@@ -464,8 +484,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
       {
        offset = INTVAL (src);
        /* R1 = C is candidate for folding.  */
-       if (!analyze)
-         bitmap_set_bit (foldable_insns, INSN_UID (insn));
+       fold_agnostic = false;
 
        /* Pattern recognized for folding.  */
        break;
@@ -475,373 +494,395 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool 
do_recursion,
       return false;
     }
 
-    if (do_recursion && !analyze)
+    if (offset_out)
       *offset_out = offset;
 
+    if (fold_agnostic)
+      {
+       /* For single-USE mode we don't need to collect fold-agnostic INSN.  */
+       if (!single_use)
+         info->fold_agnostic_insns.safe_push (insn);
+      }
+    else
+      info->fold_insns.safe_push (insn);
+
     return true;
 }
 
-/* Function that computes the offset that would have to be added to all uses
-   of REG if the instructions marked in FOLDABLE_INSNS were to be eliminated.
+/* Function that calculates the offset for INSN that would have to be added to
+   all its USEs of REG.  Foldable INSNs are added to INFO->fold_insns and
+   fold-agnostic INSNs are added to INFO->fold_agnostic_insns.
+   It is possible that some INSNs are added to both lists.  In this case the
+   INSN is a fold INSN.
 
-   If ANALYZE is true then mark in CAN_FOLD_INSNS which instructions
-   transitively only affect other instructions found in CAN_FOLD_INSNS.
-   If ANALYZE is false then compute the offset required for folding.  */
+   Returns the offset on success or 0 if the calculation fails.  */
 static HOST_WIDE_INT
-fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns)
+fold_offsets (insn_info *insn, rtx reg, fold_mem_info *info,
+             bool single_use = true)
 {
-  rtx_insn *def = get_single_def_in_bb (insn, reg);
-
-  if (!def || RTX_FRAME_RELATED_P (def) || GET_CODE (PATTERN (def)) != SET)
+  /* We can only affect the values of GPR registers.  */
+  unsigned int regno = REGNO (reg);
+  if (fixed_regs[regno]
+      || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
     return 0;
 
-  rtx dest = SET_DEST (PATTERN (def));
-
-  if (!REG_P (dest))
+  /* Get the DEF for REG in INSN.  */
+  set_info *def = get_single_def_in_bb (insn, reg, single_use);
+  if (!def)
     return 0;
 
-  /* We can only affect the values of GPR registers.  */
-  unsigned int dest_regno = REGNO (dest);
-  if (fixed_regs[dest_regno]
-      || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))
-    return 0;
+  insn_info *def_insn = def->insn ();
+  rtx_insn *def_rtl = def_insn->rtl ();
 
-  if (analyze)
+  if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      /* Check if we know how to handle DEF.  */
-      if (!fold_offsets_1 (def, true, false, NULL, NULL))
-       return 0;
-
-      /* We only fold through instructions that are transitively used as
-        memory addresses and do not have other uses.  Use the same logic
-        from offset calculation to visit instructions that can propagate
-        offsets and keep track of them in CAN_FOLD_INSNS.  */
-      bool success;
-      struct df_link *uses = get_uses (def, dest, &success), *ref_link;
+      fprintf (dump_file, "For INSN: ");
+      print_rtl_single (dump_file, insn->rtl ());
+      fprintf (dump_file, "...found DEF: ");
+      print_rtl_single (dump_file, def_rtl);
+    }
 
-      if (!success)
-       return 0;
+  gcc_assert (REGNO (reg) == REGNO (SET_DEST (PATTERN (def_rtl))));
 
-      for (ref_link = uses; ref_link; ref_link = ref_link->next)
+  /* Check if all USEs of DEF are safe.  */
+  if (!has_foldable_uses_p (def, reg))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
        {
-         rtx_insn *use = DF_REF_INSN (ref_link->ref);
-
-         if (DEBUG_INSN_P (use))
-           continue;
-
-         /* Punt if the use is anything more complicated than a set
-            (clobber, use, etc).  */
-         if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET)
-           return 0;
-
-         /* This use affects instructions outside of CAN_FOLD_INSNS.  */
-         if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
-           return 0;
-
-         rtx use_set = PATTERN (use);
-
-         /* Special case: A foldable memory store is not foldable if it
-            mentions DEST outside of the address calculation.  */
-         if (use_set && MEM_P (SET_DEST (use_set))
-             && reg_mentioned_p (dest, SET_SRC (use_set)))
-           return 0;
+         fprintf (dump_file, "has_foldable_uses_p failed for: ");
+         print_rtl_single (dump_file, def_rtl);
        }
+      return 0;
+    }
 
-      bitmap_set_bit (&can_fold_insns, INSN_UID (def));
-
+  /* Check if we know how to handle DEF.  */
+  HOST_WIDE_INT offset;
+  if (!fold_offsets_1 (def_insn, &offset, info, single_use))
+    {
       if (dump_file && (dump_flags & TDF_DETAILS))
        {
-         fprintf (dump_file, "Instruction marked for propagation: ");
-         print_rtl_single (dump_file, def);
+         fprintf (dump_file, "fold_offsets_1 failed for: ");
+         print_rtl_single (dump_file, def_rtl);
        }
+      return 0;
     }
-  else
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      /* We cannot propagate through this instruction.  */
-      if (!bitmap_bit_p (&can_fold_insns, INSN_UID (def)))
-       return 0;
+      fprintf (dump_file, "Instruction marked for propagation: ");
+      print_rtl_single (dump_file, def_rtl);
     }
 
-  HOST_WIDE_INT offset = 0;
-  bool recognized = fold_offsets_1 (def, analyze, true, &offset,
-                                   foldable_insns);
-
-  if (!recognized)
-    return 0;
-
   return offset;
 }
 
-/* Test if INSN is a memory load / store that can have an offset folded to it.
-   Return true iff INSN is such an instruction and return through MEM_OUT,
-   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
-   used as a base address and the offset accordingly.
-   All of the out pointers may be NULL in which case they will be ignored.  */
-bool
-get_fold_mem_root (rtx_insn *insn, rtx *mem_out, rtx *reg_out,
-                  HOST_WIDE_INT *offset_out)
+/* Test if we can fold the new offset into the given fold-mem-offset root insn.
+   Returns true iff folding would succeed or false otherwise.  */
+static bool
+do_check_validity (fold_mem_info *info)
 {
-  rtx set = single_set (insn);
-  rtx mem = NULL_RTX;
-
-  if (set != NULL_RTX)
-    {
-      rtx src = SET_SRC (set);
-      rtx dest = SET_DEST (set);
-
-      /* Don't fold when we have unspec / volatile.  */
-      if (GET_CODE (src) == UNSPEC
-         || GET_CODE (src) == UNSPEC_VOLATILE
-         || GET_CODE (dest) == UNSPEC
-         || GET_CODE (dest) == UNSPEC_VOLATILE)
-       return false;
-
-      if (MEM_P (src))
-       mem = src;
-      else if (MEM_P (dest))
-       mem = dest;
-      else if ((GET_CODE (src) == SIGN_EXTEND
-               || GET_CODE (src) == ZERO_EXTEND)
-              && MEM_P (XEXP (src, 0)))
-       mem = XEXP (src, 0);
-    }
-
-  if (mem == NULL_RTX)
-    return false;
+  rtx_insn *insn_rtl = info->insn->rtl ();
+  rtx mem = info->mem;
+  rtx reg = info->reg;
+  HOST_WIDE_INT new_offset = info->offset + info->added_offset;
 
+  /* Test if it is valid to change MEM's address offset to NEW_OFFSET.  */
+  int icode = INSN_CODE (insn_rtl);
+  INSN_CODE (insn_rtl) = -1;
   rtx mem_addr = XEXP (mem, 0);
-  rtx reg;
-  HOST_WIDE_INT offset;
-
-  if (REG_P (mem_addr))
-    {
-      reg = mem_addr;
-      offset = 0;
-    }
-  else if (GET_CODE (mem_addr) == PLUS
-          && REG_P (XEXP (mem_addr, 0))
-          && CONST_INT_P (XEXP (mem_addr, 1)))
-    {
-      reg = XEXP (mem_addr, 0);
-      offset = INTVAL (XEXP (mem_addr, 1));
-    }
+  machine_mode mode = GET_MODE (mem_addr);
+  if (new_offset != 0)
+    XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
   else
-    return false;
+    XEXP (mem, 0) = reg;
 
-  if (mem_out)
-    *mem_out = mem;
-  if (reg_out)
-    *reg_out = reg;
-  if (offset_out)
-    *offset_out = offset;
+  bool illegal = insn_invalid_p (insn_rtl, false)
+                || !memory_address_addr_space_p (mode, XEXP (mem, 0),
+                                                 MEM_ADDR_SPACE (mem));
 
-  return true;
+  /* Restore the instruction.  */
+  XEXP (mem, 0) = mem_addr;
+  INSN_CODE (insn_rtl) = icode;
+
+  return !illegal;
 }
 
-/* If INSN is a root memory instruction then do a DFS traversal on its
-   definitions and find folding candidates.  */
-static void
-do_analysis (rtx_insn *insn)
+/* Check if any of the the provided INSNs in INSN_LIST is not marked in the
+   given bitmap.  Return true if at least one INSN is not the bitmap and
+   false otherwise.  */
+static bool
+insn_uses_not_in_bitmap (vec<insn_info *> *insn_list, bitmap bm)
 {
-  rtx reg;
-  if (!get_fold_mem_root (insn, NULL, &reg, NULL))
-    return;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
+  for (insn_info *insn : *insn_list)
     {
-      fprintf (dump_file, "Starting analysis from root: ");
-      print_rtl_single (dump_file, insn);
+      gcc_assert (insn->num_defs () == 1);
+      set_info *def = dyn_cast<set_info *>(insn->defs ()[0]);
+      for (use_info *use : def->nondebug_insn_uses ())
+       {
+         if (!bitmap_bit_p (bm, use->insn ()->uid ()))
+           {
+             if (dump_file && (dump_flags & TDF_DETAILS))
+             {
+               fprintf (dump_file, "Cannot ensure correct transformation as "
+                        "INSN %u has a USE INSN %u that was not analysed.\n",
+                        insn->uid (), use->insn ()->uid ());
+             }
+
+             return true;
+           }
+       }
     }
 
-  /* Analyse folding opportunities for this memory instruction.  */
-  bitmap_set_bit (&can_fold_insns, INSN_UID (insn));
-  fold_offsets (insn, reg, true, NULL);
+  return false;
 }
 
+/* Check if all USEs of all instructions have been analysed.
+   If a fold_mem_info is found that has an unknown USE, then
+   drop it from the list.  When this function returns, all
+   fold_mem_infos in the worklist reference instructions that
+   have been analysed before and can therefore be committed.  */
 static void
-do_fold_info_calculation (rtx_insn *insn, fold_info_map *fold_info)
+drop_unsafe_candidates (vec<fold_mem_info *> *worklist)
 {
-  rtx mem, reg;
-  HOST_WIDE_INT cur_offset;
-  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
-    return;
+  /* First mark all analysed INSNs in a bitmap.  */
+  auto_bitmap insn_closure;
+  for (fold_mem_info *info : worklist)
+    {
+      bitmap_set_bit (insn_closure, info->insn->uid ());
+      for (insn_info *insn : info->fold_agnostic_insns)
+       bitmap_set_bit (insn_closure, insn->uid ());
+      for (insn_info *insn : info->fold_insns)
+       bitmap_set_bit (insn_closure, insn->uid ());
+    }
 
-  fold_mem_info *info = new fold_mem_info;
-  info->added_offset = fold_offsets (insn, reg, false, info->fold_insns);
+  /* Now check if all uses of fold_insns are marked.  */
+  unsigned i;
+  fold_mem_info *info;
+  FOR_EACH_VEC_ELT (*worklist, i, info)
+    {
+      if (insn_uses_not_in_bitmap (&info->fold_agnostic_insns, insn_closure)
+         || insn_uses_not_in_bitmap (&info->fold_insns, insn_closure))
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           {
+             fprintf (dump_file, "Dropping fold-mem-offset root INSN %u.\n",
+                      info->insn->uid ());
+           }
 
-  fold_info->put (insn, info);
+         /* Drop INFO from worklist and start over.  */
+         info->fold_agnostic_insns.truncate (0);
+         info->fold_insns.truncate (0);
+         worklist->unordered_remove (i);
+         delete info;
+         drop_unsafe_candidates (worklist);
+         return;
+       }
+    }
 }
 
-/* If INSN is a root memory instruction then compute a potentially new offset
-   for it and test if the resulting instruction is valid.  */
+/* Fold the new offset into the fold-mem-info root.  */
 static void
-do_check_validity (rtx_insn *insn, fold_mem_info *info)
+do_commit_offset (fold_mem_info *info)
 {
-  rtx mem, reg;
-  HOST_WIDE_INT cur_offset;
-  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
-    return;
+  insn_info *insn = info->insn;
+  rtx_insn *insn_rtl = insn->rtl ();
+  rtx mem = info->mem;
+  rtx reg = info->reg;
+  HOST_WIDE_INT new_offset = info->offset + info->added_offset;
 
-  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+  if (info->added_offset == 0)
+    return;
 
-  /* Test if it is valid to change MEM's address offset to NEW_OFFSET.  */
-  int icode = INSN_CODE (insn);
-  INSN_CODE (insn) = -1;
-  rtx mem_addr = XEXP (mem, 0);
-  machine_mode mode = GET_MODE (mem_addr);
+  machine_mode mode = GET_MODE (XEXP (mem, 0));
   if (new_offset != 0)
     XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
   else
     XEXP (mem, 0) = reg;
+  INSN_CODE (insn_rtl) = recog (PATTERN (insn_rtl), insn_rtl, 0);
+  df_insn_rescan (insn_rtl);
 
-  bool illegal = insn_invalid_p (insn, false)
-                || !memory_address_addr_space_p (mode, XEXP (mem, 0),
-                                                 MEM_ADDR_SPACE (mem));
-
-  /* Restore the instruction.  */
-  XEXP (mem, 0) = mem_addr;
-  INSN_CODE (insn) = icode;
-
-  if (illegal)
-    bitmap_ior_into (&cannot_fold_insns, info->fold_insns);
-  else
-    bitmap_ior_into (&candidate_fold_insns, info->fold_insns);
+  if (dump_file)
+    fprintf (dump_file, "INSN %u: Memory offset changed from "
+            HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC ".\n",
+            insn->uid (), info->offset, new_offset);
 }
 
-static bool
-compute_validity_closure (fold_info_map *fold_info)
+/* Set the constant of the fold INSN to zero.  */
+static void
+do_commit_insn (insn_info *insn)
 {
-  /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
-     and memory operations rN that use xN as shown below.  If folding x1 in r1
-     turns out to be invalid for whatever reason then it's also invalid to fold
-     any of the other xN into any rN.  That means that we need the transitive
-     closure of validity to determine whether we can fold a xN instruction.
-
-     +--------------+    +-------------------+    +-------------------+
-     | r1 = mem[x1] |    | r2 = mem[x1 + x2] |    | r3 = mem[x2 + x3] |   ...
-     +--------------+    +-------------------+    +-------------------+
-           ^                ^       ^                ^       ^
-           |               /        |               /        |           ...
-           |              /         |              /         |
-     +-------------+      /   +-------------+      /   +-------------+
-     | x1 = x1 + 1 |-----+    | x2 = x2 + 1 |-----+    | x3 = x3 + 1 |--- ...
-     +-------------+          +-------------+          +-------------+
-           ^                        ^                        ^
-           |                        |                        |
-          ...                      ...                      ...
-  */
-
-  /* In general three iterations should be enough for most cases, but allow up
-     to five when -fexpensive-optimizations is used.  */
-  int max_iters = 3 + 2 * flag_expensive_optimizations;
-  for (int pass = 0; pass < max_iters; pass++)
+  rtx_insn *insn_rtl = insn->rtl ();
+
+  /* If we deleted this INSNs before, then nothing left to do here.  */
+  if (insn_rtl->deleted ())
+    return;
+
+  rtx set = single_set (insn_rtl);
+  rtx src = SET_SRC (set);
+
+  /* Emit a move and let subsequent passes eliminate it if possible.  */
+  if (GET_CODE (src) == CONST_INT)
     {
-      bool made_changes = false;
-      for (fold_info_map::iterator iter = fold_info->begin ();
-          iter != fold_info->end (); ++iter)
+      /* Only change if necessary.  */
+      if (INTVAL (src))
        {
-         fold_mem_info *info = (*iter).second;
-         if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
-           made_changes |= bitmap_ior_into (&cannot_fold_insns,
-                                            info->fold_insns);
+         /* INSN is R1 = C.  Set C to 0 because it was folded.  */
+         SET_SRC (set) = CONST0_RTX (GET_MODE (SET_SRC (set)));
+         df_insn_rescan (insn_rtl);
+       }
+    }
+  else
+    {
+      /* Only change if necessary.  */
+      if (INTVAL (XEXP (src, 1)))
+       {
+         /* INSN is R1 = R2 + C.  Set C to 0 because it was folded.  */
+         XEXP (src, 1) = CONST0_RTX (GET_MODE (XEXP (src, 1)));
+         df_insn_rescan (insn_rtl);
+
+         /* Mark self-assignments for deletion.  */
+         rtx dest = SET_DEST (set);
+         if (REGNO (dest) == REGNO (XEXP (src, 0)))
+           delete_insn (insn_rtl);
        }
-
-      if (!made_changes)
-       return true;
     }
 
-  return false;
+  if (dump_file)
+    fprintf (dump_file, "INSN %u: Constant set to zero.\n", insn->uid ());
 }
 
-/* If INSN is a root memory instruction that was affected by any folding
-   then update its offset as necessary.  */
-static void
-do_commit_offset (rtx_insn *insn, fold_mem_info *info)
+/* Fold memory offsets by analysing the DEF-USE chain where the DEFs
+   will only have a single USE.  */
+static unsigned int
+fold_mem_offsets_single_use ()
 {
-  rtx mem, reg;
-  HOST_WIDE_INT cur_offset;
-  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
-    return;
+  unsigned int stats_fold_count = 0;
 
-  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+  /* Iterate over all nondebug INSNs get our candidates and fold them.  */
+  for (auto insn : iterate_safely (crtl->ssa->nondebug_insns ()))
+    {
+      if (!insn->is_real () || !insn->can_be_optimized ())
+       continue;
 
-  if (new_offset == cur_offset)
-    return;
+      rtx mem, reg;
+      HOST_WIDE_INT offset;
+      if (!get_fold_mem_offset_root (insn, &mem, &reg, &offset))
+       continue;
 
-  gcc_assert (!bitmap_empty_p (info->fold_insns));
+      fold_mem_info info;
+      info.insn = insn;
+      info.mem = mem;
+      info.reg = reg;
+      info.offset = offset;
 
-  if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
-    return;
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       {
+         fprintf (dump_file, "Starting analysis from root: ");
+         print_rtl_single (dump_file, info.insn->rtl ());
+       }
 
-  if (dump_file)
-    {
-      fprintf (dump_file, "Memory offset changed from "
-              HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC
-              " for instruction:\n", cur_offset, new_offset);
-      print_rtl_single (dump_file, insn);
+      /* Walk DEF-chain and collect info.fold_insns and
+        the resulting offset.  */
+      info.added_offset = fold_offsets (info.insn, info.reg, &info);
+      if (info.added_offset == 0 || !do_check_validity (&info))
+         continue;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       fprintf (dump_file,
+                "Found root offset delta: " HOST_WIDE_INT_PRINT_DEC "\n",
+                info.added_offset);
+
+      do_commit_offset (&info);
+
+      while (!info.fold_insns.is_empty ())
+       {
+         insn_info *insn = info.fold_insns.pop ();
+         do_commit_insn (insn);
+         stats_fold_count++;
+       }
     }
 
-  machine_mode mode = GET_MODE (XEXP (mem, 0));
-  if (new_offset != 0)
-    XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
-  else
-    XEXP (mem, 0) = reg;
-  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
-  df_insn_rescan (insn);
+  return stats_fold_count;
 }
 
-/* If INSN is a move / add instruction that was folded then replace its
-   constant part with zero.  */
-static void
-do_commit_insn (rtx_insn *insn)
+/* Fold memory offsets by analysing the DEF-USE chain where the DEFs
+   can have multiple USE.  */
+static unsigned int
+fold_mem_offsets_multi_use ()
 {
-  if (bitmap_bit_p (&candidate_fold_insns, INSN_UID (insn))
-      && !bitmap_bit_p (&cannot_fold_insns, INSN_UID (insn)))
+  unsigned int stats_fold_count = 0;
+
+  /* Iterate over all nondebug INSNs and get our candidates.  */
+  auto_vec<fold_mem_info *> worklist;
+  for (auto insn : iterate_safely (crtl->ssa->nondebug_insns ()))
     {
-      if (dump_file)
-       {
-         fprintf (dump_file, "Instruction folded:");
-         print_rtl_single (dump_file, insn);
-       }
+      if (!insn->is_real () || !insn->can_be_optimized ())
+       continue;
 
-      stats_fold_count++;
+      rtx mem, reg;
+      HOST_WIDE_INT offset;
+      if (!get_fold_mem_offset_root (insn, &mem, &reg, &offset))
+       continue;
 
-      rtx set = single_set (insn);
-      rtx dest = SET_DEST (set);
-      rtx src = SET_SRC (set);
+      fold_mem_info *info = new fold_mem_info;
+      info->insn = insn;
+      info->mem = mem;
+      info->reg = reg;
+      info->offset = offset;
 
-      /* Emit a move and let subsequent passes eliminate it if possible.  */
-      if (GET_CODE (src) == CONST_INT)
+      if (dump_file && (dump_flags & TDF_DETAILS))
        {
-         /* INSN is R1 = C.
-            Replace it with R1 = 0 because C was folded.  */
-         rtx mov_rtx
-           = gen_move_insn (dest, gen_int_mode (0, GET_MODE (dest)));
-         df_insn_rescan (emit_insn_after (mov_rtx, insn));
+         fprintf (dump_file, "Starting analysis from root: ");
+         print_rtl_single (dump_file, info->insn->rtl ());
        }
-      else
+
+      /* Walk DEF-chain and collect info->fold_insns, info->fold_agnostic_insns
+        and the resulting offset.  */
+      info->added_offset = fold_offsets (info->insn, info->reg, info, false);
+      if (info->added_offset == 0 || !do_check_validity (info))
        {
-         /* INSN is R1 = R2 + C.
-            Replace it with R1 = R2 because C was folded.  */
-         rtx arg1 = XEXP (src, 0);
+         delete info;
+         continue;
+       }
 
-         /* If the DEST == ARG1 then the move is a no-op.  */
-         if (REGNO (dest) != REGNO (arg1))
-           {
-             gcc_checking_assert (GET_MODE (dest) == GET_MODE (arg1));
-             rtx mov_rtx = gen_move_insn (dest, arg1);
-             df_insn_rescan (emit_insn_after (mov_rtx, insn));
-           }
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       fprintf (dump_file,
+                "Found root offset delta: " HOST_WIDE_INT_PRINT_DEC "\n",
+                info->added_offset);
+
+      /* Append candidate.  */
+      worklist.safe_push (info);
+    }
+
+  /* Now drop all fold_mem_infos, which contain INSNs that have unknown
+     USEs and are therefore not safe to change.  */
+  drop_unsafe_candidates (&worklist);
+
+  while (!worklist.is_empty ())
+    {
+      fold_mem_info *info = worklist.pop ();
+      do_commit_offset (info);
+
+      while (!info->fold_insns.is_empty ())
+       {
+         insn_info *insn = info->fold_insns.pop ();
+         do_commit_insn (insn);
+         stats_fold_count++;
        }
 
-      /* Delete the original move / add instruction.  */
-      delete_insn (insn);
+      info->fold_agnostic_insns.truncate (0);
+      delete info;
     }
+
+  return stats_fold_count;
 }
 
-unsigned int
-pass_fold_mem_offsets::execute (function *fn)
+/* Main function of fold-mem-offsets pass.  */
+static unsigned int
+fold_mem_offsets (function *fn)
 {
+  bool multi_use_mode = true;
+
   /* Computing UD/DU chains for flow graphs which have a high connectivity
      will take a long time and is unlikely to be particularly useful.
 
@@ -856,69 +897,82 @@ pass_fold_mem_offsets::execute (function *fn)
               "fold-mem-offsets: %d basic blocks and %d edges/basic block",
               n_basic_blocks_for_fn (cfun),
               n_edges_for_fn (cfun) / n_basic_blocks_for_fn (cfun));
-      return 0;
+      multi_use_mode = false;
     }
 
-  df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + DF_DEFER_INSN_RESCAN);
-  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
+  /* There is a conflict between this pass and RISCV's shorten-memrefs
+     pass.  For now disable folding if optimizing for size because
+     otherwise this cancels the effects of shorten-memrefs.  */
+  cgraph_node *n = cgraph_node::get (fn->decl);
+  if (n && n->optimize_for_size_p ())
+    return 0;
+
+  /* Initialise RTL SSA.  */
+  calculate_dominance_info (CDI_DOMINATORS);
   df_analyze ();
+  crtl->ssa = new rtl_ssa::function_info (cfun);
 
-  bitmap_initialize (&can_fold_insns, NULL);
-  bitmap_initialize (&candidate_fold_insns, NULL);
-  bitmap_initialize (&cannot_fold_insns, NULL);
+  /* The number of instructions that were simplified or eliminated.  */
+  int stats_fold_count = 0;
 
-  stats_fold_count = 0;
+  /* Fold mem offsets with DEFs that have a single USE.  */
+  stats_fold_count += fold_mem_offsets_single_use ();
 
-  basic_block bb;
-  rtx_insn *insn;
-  FOR_ALL_BB_FN (bb, fn)
+  /* Fold mem offsets with DEFs that have multiple USEs.  */
+  if (multi_use_mode || flag_expensive_optimizations)
     {
-      /* There is a conflict between this pass and RISCV's shorten-memrefs
-        pass.  For now disable folding if optimizing for size because
-        otherwise this cancels the effects of shorten-memrefs.  */
-      if (optimize_bb_for_size_p (bb))
-       continue;
-
-      fold_info_map fold_info;
+      if (dump_file)
+       fprintf (dump_file, "Starting multi-use analysis\n");
+      stats_fold_count += fold_mem_offsets_multi_use ();
+    }
 
-      bitmap_clear (&can_fold_insns);
-      bitmap_clear (&candidate_fold_insns);
-      bitmap_clear (&cannot_fold_insns);
+  statistics_counter_event (cfun, "Number of folded instructions",
+                           stats_fold_count);
 
-      FOR_BB_INSNS (bb, insn)
-       do_analysis (insn);
+  free_dominance_info (CDI_DOMINATORS);
+  cleanup_cfg (0);
 
-      FOR_BB_INSNS (bb, insn)
-       do_fold_info_calculation (insn, &fold_info);
+  delete crtl->ssa;
+  crtl->ssa = nullptr;
 
-      FOR_BB_INSNS (bb, insn)
-       if (fold_mem_info **info = fold_info.get (insn))
-         do_check_validity (insn, *info);
+  delete_trivially_dead_insns (get_insns (), max_reg_num ());
 
-      if (compute_validity_closure (&fold_info))
-       {
-         FOR_BB_INSNS (bb, insn)
-           if (fold_mem_info **info = fold_info.get (insn))
-             do_commit_offset (insn, *info);
+  return 0;
+}
 
-         FOR_BB_INSNS (bb, insn)
-           do_commit_insn (insn);
-       }
+namespace {
 
-      for (fold_info_map::iterator iter = fold_info.begin ();
-          iter != fold_info.end (); ++iter)
-       delete (*iter).second;
-    }
+const pass_data pass_data_fold_mem =
+{
+  RTL_PASS, /* type */
+  "fold_mem_offsets", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_FOLD_MEM_OFFSETS, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_df_finish, /* todo_flags_finish */
+};
 
-  statistics_counter_event (cfun, "Number of folded instructions",
-                           stats_fold_count);
+class pass_fold_mem_offsets : public rtl_opt_pass
+{
+public:
+  pass_fold_mem_offsets (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_fold_mem, ctxt)
+  {}
 
-  bitmap_release (&can_fold_insns);
-  bitmap_release (&candidate_fold_insns);
-  bitmap_release (&cannot_fold_insns);
+  /* opt_pass methods: */
+  bool gate (function *) final override
+    {
+      return flag_fold_mem_offsets && optimize >= 2;
+    }
 
-  return 0;
-}
+  unsigned int execute (function *fn) final override
+    {
+      return fold_mem_offsets (fn);
+    }
+}; // class pass_fold_mem_offsets
 
 } // anon namespace
 
-- 
2.49.0

Reply via email to