Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

Jeff Law Fri, 29 Sep 2023 12:22:16 -0700



On 9/12/23 04:13, Manolis Tsamis wrote:

+
+/* Get the single reaching definition of an instruction inside a BB.
+   The definition is desired for REG used in INSN.
+   Return the definition insn or NULL if there's no definition with
+   the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+    {
+      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+     return NULL;
+      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+     break;
+    }
+
+  if (!use)
+    return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);

So what if there's two uses of REG in INSN?  I don't think it's be
common at all, but probably better safe and reject than sorry, right? Or
is that case filtered out earlier?

If the REG is the same won't the definitions be the same even if that
REG appears multiple times in INSN?

Yes.

fold_offsets_1 should be able to handle the folding with multiple uses
of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
If there's no other issue here I assume we want to keep that as-is in
order to not reduce the propagation power (Which I assume is similar
to ree which uses the same logic).

OK. I was primarily concerned about the folding and rewriting aspects.It probably can only show up on targets with LEA like instructions, andeven on such targets it's probably rate.

+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an instruction and return through MEM_OUT,
+   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
+   used as a base address and the offset accordingly.
+   All of the out pointers may be NULL in which case they will be ignored.  */
+bool
+get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
+                HOST_WIDE_INT *offset_out)
+{
+  rtx set = single_set (insn);
+  rtx mem = NULL_RTX;
+
+  if (set != NULL_RTX)
+    {
+      rtx src = SET_SRC (set);
+      rtx dest = SET_DEST (set);
+
+      /* Don't fold when we have unspec / volatile.  */
+      if (GET_CODE (src) == UNSPEC
+       || GET_CODE (src) == UNSPEC_VOLATILE
+       || GET_CODE (dest) == UNSPEC
+       || GET_CODE (dest) == UNSPEC_VOLATILE)
+     return false;
+
+      if (MEM_P (src))
+     mem = src;
+      else if (MEM_P (dest))
+     mem = dest;
+      else if ((GET_CODE (src) == SIGN_EXTEND
+             || GET_CODE (src) == ZERO_EXTEND)
+            && MEM_P (XEXP (src, 0)))

Note some architectures allow both a source and destination memory.  It
looks like your code will prefer the source operand in that case.
That's fine, just pointing it out.

Thanks for pointing that possibility out. I thought for a moment that
this would be a bug with multiple mentions of the address register.
but it should be fine due to:
/* Special case: A foldable memory store is not foldable if it
mentions DEST outside of the address calculation. */

ACK.

The other thing I keep pondering is autoincrement style addressing.Though I think at some point I convinced myself they weren't a problem.I think your checks only allow specific kinds of expressions for thememory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in thelist of valid ops.

+
+  int max_iters = 5;
+  for (int i = 0; i < max_iters; i++)
+    {
+      bool made_changes = false;
+      for (fold_info_map::iterator iter = fold_info->begin ();
+        iter != fold_info->end (); ++iter)
+     {
+       fold_mem_info *info = (*iter).second;
+       if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
+         made_changes |= bitmap_ior_into (&cannot_fold_insns,
+                                          info->fold_insns);
+     }
+
+      if (!made_changes)
+     return true;
+    }
+
+  return false;

So how was the magic value of "5" determined here?  In general we try
not to have magic #s like that and instead find a better way to control
iterations, falling back to a PARAM when all else fails.

It's indeed just a magic value :)
In general even two iterations of this should be quite rare. One
iteration will handle the majority of interesting cases. I chose 5 to
limit the worst case execution time for degenerate cases.

I'll be happy to change that to something better but I couldn't
figure out how "falling back to a PARAM when all else fails"  should
work here. Is there a similar example somewhere else in the code that
I can look at?

If we expect two iterations to be quite rare, then I doubt a param isreally worth the effort. I might suggest using a loop like


for (pass = 0; pass < flag_expensive_optimizations + 1; pass++)

No magic numbers :-) We use similar loops elsewhere for this kind ofscenario.

If you ever need to create a param, the procedure is to add the optionto params.opt. If you look in that file you'll see that you define avariable name in the specification that you can then reference.


So for a simple example look for param_avg_loop_niter.

Jeff

Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

Reply via email to