Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

Manolis Tsamis Tue, 12 Sep 2023 03:14:23 -0700

On Tue, Sep 12, 2023 at 3:47 AM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 9/9/23 02:46, Manolis Tsamis wrote:
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >    addi t4,sp,16
> >    add  t2,a6,t4
> >    shl  t3,t2,1
> >    ld   a2,0(t3)
> >    addi a2,1
> >    sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >    add  t2,a6,sp
> >    shl  t3,t2,1
> >    ld   a2,32(t3)
> >    addi a2,1
> >    sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > gcc/ChangeLog:
> >
> >       * Makefile.in: Add fold-mem-offsets.o.
> >       * passes.def: Schedule a new pass.
> >       * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> >       * common.opt: New options.
> >       * doc/invoke.texi: Document new option.
> >       * fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
> > ---
> >
> > Changes in v5:
> >          - Introduce new helper function fold_offsets_1.
> >          - Fix bug because constants could be partially propagated
> >            through instructions that weren't understood.
> >          - Introduce helper class fold_mem_info that stores f-m-o
> >            info for an instruction.
> >          - Calculate fold_offsets only once with do_fold_info_calculation.
> >          - Fix correctness issue by introducing compute_validity_closure.
> >          - Propagate in more cases for PLUS/MINUS with constant.
> >
> > Changes in v4:
> >          - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
> >          - Remove fold_mem_offsets_driver and enum fold_mem_phase.
> >          - Call recog when patching offsets in do_commit_offset.
> >          - Restore INSN_CODE after modifying insn in do_check_validity.
> >
> > Changes in v3:
> >          - Added propagation for more codes:
> >            sub, neg, mul.
> >          - Added folding / elimination for sub and
> >            const int moves.
> >          - For the validity check of the generated addresses
> >            also test memory_address_addr_space_p.
> >          - Replaced GEN_INT with gen_int_mode.
> >          - Replaced some bitmap_head with auto_bitmap.
> >          - Refactor each phase into own function for readability.
> >          - Add dump details.
> >          - Replace rtx iteration with reg_mentioned_p.
> >          - Return early for codes that we can't propagate through.
> >
> > Changes in v2:
> >          - Made the pass target-independant instead of RISCV specific.
> >          - Fixed a number of bugs.
> >          - Add code to handle more ADD patterns as found
> >            in other targets (x86, aarch64).
> >          - Improved naming and comments.
> >          - Fixed bitmap memory leak.
> >
>
>
> > +
> > +/* Get the single reaching definition of an instruction inside a BB.
> > +   The definition is desired for REG used in INSN.
> > +   Return the definition insn or NULL if there's no definition with
> > +   the desired criteria.  */
> > +static rtx_insn*
> > +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> > +{
> > +  df_ref use;
> > +  struct df_link *ref_chain, *ref_link;
> > +
> > +  FOR_EACH_INSN_USE (use, insn)
> > +    {
> > +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> > +     return NULL;
> > +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> > +     break;
> > +    }
> > +
> > +  if (!use)
> > +    return NULL;
> > +
> > +  ref_chain = DF_REF_CHAIN (use);
> So what if there's two uses of REG in INSN?  I don't think it's be
> common at all, but probably better safe and reject than sorry, right? Or
> is that case filtered out earlier?
>
If the REG is the same won't the definitions be the same even if that
REG appears multiple times in INSN?
fold_offsets_1 should be able to handle the folding with multiple uses
of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
If there's no other issue here I assume we want to keep that as-is in
order to not reduce the propagation power (Which I assume is similar
to ree which uses the same logic).


>
>
>
> > +
> > +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
> Formatting nit.  The '*' should be next to the variable, not the type.
>
Thanks, I fixed this in multiple places. I thought that
check_GNU_style would whine about this...

> > +
>
> > +
> > +static HOST_WIDE_INT
> > +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap 
> > foldable_insns);
> > +
> > +/*  Helper function for fold_offsets.
> > +
> > +    If DO_RECURSION is false and ANALYZE is true this function returns 
> > true iff
> > +    it understands the structure of INSN and knows how to propagate 
> > constants
> > +    through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
> > +
> > +    If DO_RECURSION is true then it also calls fold_offsets for each 
> > recognised
> > +    part of INSN with the appropriate arguments.
> > +
> > +    If DO_RECURSION is true and ANALYZE is false then offset that would 
> > result
> > +    from folding is computed and is returned through the pointer 
> > OFFSET_OUT.
> > +    The instructions that can be folded are recorded in FOLDABLE_INSNS.
> > +*/
> > +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool 
> > do_recursion,
> > +                         HOST_WIDE_INT *offset_out, bitmap foldable_insns)
> Nit.  Linkage and return type on separate line.  That makes the function
> name start at the beginning of a line.
>
Thanks, I missed that. Fixed.

>
>
> > +
> > +/* Test if INSN is a memory load / store that can have an offset folded to 
> > it.
> > +   Return true iff INSN is such an instruction and return through MEM_OUT,
> > +   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
> > +   used as a base address and the offset accordingly.
> > +   All of the out pointers may be NULL in which case they will be ignored. 
> >  */
> > +bool
> > +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> > +                HOST_WIDE_INT *offset_out)
> > +{
> > +  rtx set = single_set (insn);
> > +  rtx mem = NULL_RTX;
> > +
> > +  if (set != NULL_RTX)
> > +    {
> > +      rtx src = SET_SRC (set);
> > +      rtx dest = SET_DEST (set);
> > +
> > +      /* Don't fold when we have unspec / volatile.  */
> > +      if (GET_CODE (src) == UNSPEC
> > +       || GET_CODE (src) == UNSPEC_VOLATILE
> > +       || GET_CODE (dest) == UNSPEC
> > +       || GET_CODE (dest) == UNSPEC_VOLATILE)
> > +     return false;
> > +
> > +      if (MEM_P (src))
> > +     mem = src;
> > +      else if (MEM_P (dest))
> > +     mem = dest;
> > +      else if ((GET_CODE (src) == SIGN_EXTEND
> > +             || GET_CODE (src) == ZERO_EXTEND)
> > +            && MEM_P (XEXP (src, 0)))
> Note some architectures allow both a source and destination memory.  It
> looks like your code will prefer the source operand in that case.
> That's fine, just pointing it out.
>
Thanks for pointing that possibility out. I thought for a moment that
this would be a bug with multiple mentions of the address register.
but it should be fine due to:
/* Special case: A foldable memory store is not foldable if it
mentions DEST outside of the address calculation. */

>
>
> > +
> > +static bool
> > +compute_validity_closure (fold_info_map *fold_info)
> > +{
> > +  /* Let's say we have an arbitrary chain of foldable instructions xN = xN 
> > + C
> > +     and memory operations rN that use xN as shown below.  If folding x1 
> > in r1
> > +     turns out to be invalid for whatever reason then it's also invalid to 
> > fold
> > +     any of the other xN into any rN.  That means that we need the 
> > transitive
> > +     closure of validity to determine whether we can fold a xN instruction.
> > +
> > +     +--------------+    +-------------------+    +-------------------+
> > +     | r1 = mem[x1] |    | r2 = mem[x1 + x2] |    | r3 = mem[x2 + x3] |   
> > ...
> > +     +--------------+    +-------------------+    +-------------------+
> > +         ^                ^       ^                ^       ^
> > +         |               /        |               /        |           ...
> > +         |              /         |              /         |
> > +     +-------------+      /   +-------------+      /   +-------------+
> > +     | x1 = x1 + 1 |-----+    | x2 = x2 + 1 |-----+    | x3 = x3 + 1 |--- 
> > ...
> > +     +-------------+          +-------------+          +-------------+
> > +         ^                        ^                        ^
> > +         |                        |                        |
> > +        ...                      ...                      ...
> > +  */
> > +
> > +  int max_iters = 5;
> > +  for (int i = 0; i < max_iters; i++)
> > +    {
> > +      bool made_changes = false;
> > +      for (fold_info_map::iterator iter = fold_info->begin ();
> > +        iter != fold_info->end (); ++iter)
> > +     {
> > +       fold_mem_info *info = (*iter).second;
> > +       if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> > +         made_changes |= bitmap_ior_into (&cannot_fold_insns,
> > +                                          info->fold_insns);
> > +     }
> > +
> > +      if (!made_changes)
> > +     return true;
> > +    }
> > +
> > +  return false;
> So how was the magic value of "5" determined here?  In general we try
> not to have magic #s like that and instead find a better way to control
> iterations, falling back to a PARAM when all else fails.
>
It's indeed just a magic value :)
In general even two iterations of this should be quite rare. One
iteration will handle the majority of interesting cases. I chose 5 to
limit the worst case execution time for degenerate cases.

I'll be happy to change that to something better but I couldn't
figure out how "falling back to a PARAM when all else fails"  should
work here. Is there a similar example somewhere else in the code that
I can look at?

>
> > +}
> >> +
> > +  machine_mode mode = GET_MODE (XEXP (mem, 0));
> > +  XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, 
> > mode));
> > +  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
> > +  df_insn_rescan (insn);
> Don't we need to check if NEW_OFFSET is zero and if so generate simple
> register indirect addressing rather than indirect + displacement?
>
Indeed, I will do that. I don't think it will cause any issue with the
validation/commit functions.

> This looks *so* close ;-)  But I think we need a few questions answered
> and a few minor adjustments.
>
Definitely, all the comments have changed f-m-o for the better and are
much appreciated!

> On a positive note, m68k passed with this version of the f-m-o patch :-)
>
Great to know!

> Jeff
>

Manolis

Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

Reply via email to