On 9/9/23 02:46, Manolis Tsamis wrote:
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

        * Makefile.in: Add fold-mem-offsets.o.
        * passes.def: Schedule a new pass.
        * tree-pass.h (make_pass_fold_mem_offsets): Declare.
        * common.opt: New options.
        * doc/invoke.texi: Document new option.
        * fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/fold-mem-offsets-1.c: New test.
        * gcc.target/riscv/fold-mem-offsets-2.c: New test.
        * gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
---

Changes in v5:
         - Introduce new helper function fold_offsets_1.
         - Fix bug because constants could be partially propagated
           through instructions that weren't understood.
         - Introduce helper class fold_mem_info that stores f-m-o
           info for an instruction.
         - Calculate fold_offsets only once with do_fold_info_calculation.
         - Fix correctness issue by introducing compute_validity_closure.
         - Propagate in more cases for PLUS/MINUS with constant.

Changes in v4:
         - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
         - Remove fold_mem_offsets_driver and enum fold_mem_phase.
         - Call recog when patching offsets in do_commit_offset.
         - Restore INSN_CODE after modifying insn in do_check_validity.

Changes in v3:
         - Added propagation for more codes:
           sub, neg, mul.
         - Added folding / elimination for sub and
           const int moves.
         - For the validity check of the generated addresses
           also test memory_address_addr_space_p.
         - Replaced GEN_INT with gen_int_mode.
         - Replaced some bitmap_head with auto_bitmap.
         - Refactor each phase into own function for readability.
         - Add dump details.
         - Replace rtx iteration with reg_mentioned_p.
         - Return early for codes that we can't propagate through.

Changes in v2:
         - Made the pass target-independant instead of RISCV specific.
         - Fixed a number of bugs.
         - Add code to handle more ADD patterns as found
           in other targets (x86, aarch64).
         - Improved naming and comments.
         - Fixed bitmap memory leak.



+
+/* Get the single reaching definition of an instruction inside a BB.
+   The definition is desired for REG used in INSN.
+   Return the definition insn or NULL if there's no definition with
+   the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+    {
+      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+       return NULL;
+      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+       break;
+    }
+
+  if (!use)
+    return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);
So what if there's two uses of REG in INSN? I don't think it's be common at all, but probably better safe and reject than sorry, right? Or is that case filtered out earlier?




+
+  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
Formatting nit.  The '*' should be next to the variable, not the type.

+

+
+static HOST_WIDE_INT
+fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns);
+
+/*  Helper function for fold_offsets.
+
+    If DO_RECURSION is false and ANALYZE is true this function returns true iff
+    it understands the structure of INSN and knows how to propagate constants
+    through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
+
+    If DO_RECURSION is true then it also calls fold_offsets for each recognised
+    part of INSN with the appropriate arguments.
+
+    If DO_RECURSION is true and ANALYZE is false then offset that would result
+    from folding is computed and is returned through the pointer OFFSET_OUT.
+    The instructions that can be folded are recorded in FOLDABLE_INSNS.
+*/
+static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion,
+                           HOST_WIDE_INT *offset_out, bitmap foldable_insns)
Nit. Linkage and return type on separate line. That makes the function name start at the beginning of a line.



+
+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an instruction and return through MEM_OUT,
+   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
+   used as a base address and the offset accordingly.
+   All of the out pointers may be NULL in which case they will be ignored.  */
+bool
+get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
+                  HOST_WIDE_INT *offset_out)
+{
+  rtx set = single_set (insn);
+  rtx mem = NULL_RTX;
+
+  if (set != NULL_RTX)
+    {
+      rtx src = SET_SRC (set);
+      rtx dest = SET_DEST (set);
+
+      /* Don't fold when we have unspec / volatile.  */
+      if (GET_CODE (src) == UNSPEC
+         || GET_CODE (src) == UNSPEC_VOLATILE
+         || GET_CODE (dest) == UNSPEC
+         || GET_CODE (dest) == UNSPEC_VOLATILE)
+       return false;
+
+      if (MEM_P (src))
+       mem = src;
+      else if (MEM_P (dest))
+       mem = dest;
+      else if ((GET_CODE (src) == SIGN_EXTEND
+               || GET_CODE (src) == ZERO_EXTEND)
+              && MEM_P (XEXP (src, 0)))
Note some architectures allow both a source and destination memory. It looks like your code will prefer the source operand in that case. That's fine, just pointing it out.



+
+static bool
+compute_validity_closure (fold_info_map *fold_info)
+{
+  /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
+     and memory operations rN that use xN as shown below.  If folding x1 in r1
+     turns out to be invalid for whatever reason then it's also invalid to fold
+     any of the other xN into any rN.  That means that we need the transitive
+     closure of validity to determine whether we can fold a xN instruction.
+
+     +--------------+    +-------------------+    +-------------------+
+     | r1 = mem[x1] |    | r2 = mem[x1 + x2] |    | r3 = mem[x2 + x3] |   ...
+     +--------------+    +-------------------+    +-------------------+
+           ^                ^       ^                ^       ^
+           |               /        |               /        |           ...
+           |              /         |              /         |
+     +-------------+      /   +-------------+      /   +-------------+
+     | x1 = x1 + 1 |-----+    | x2 = x2 + 1 |-----+    | x3 = x3 + 1 |--- ...
+     +-------------+          +-------------+          +-------------+
+           ^                        ^                        ^
+           |                        |                        |
+          ...                      ...                      ...
+  */
+
+  int max_iters = 5;
+  for (int i = 0; i < max_iters; i++)
+    {
+      bool made_changes = false;
+      for (fold_info_map::iterator iter = fold_info->begin ();
+          iter != fold_info->end (); ++iter)
+       {
+         fold_mem_info *info = (*iter).second;
+         if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
+           made_changes |= bitmap_ior_into (&cannot_fold_insns,
+                                            info->fold_insns);
+       }
+
+      if (!made_changes)
+       return true;
+    }
+
+  return false;
So how was the magic value of "5" determined here? In general we try not to have magic #s like that and instead find a better way to control iterations, falling back to a PARAM when all else fails.


+}
+
+  machine_mode mode = GET_MODE (XEXP (mem, 0));
+  XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
+  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
+  df_insn_rescan (insn);
Don't we need to check if NEW_OFFSET is zero and if so generate simple register indirect addressing rather than indirect + displacement?

This looks *so* close ;-) But I think we need a few questions answered and a few minor adjustments.

On a positive note, m68k passed with this version of the f-m-o patch :-)

Jeff

Reply via email to