On Fri, Jul 9, 2021 at 4:39 AM Jeff Law <j...@tachyum.com> wrote:
>
>
>
> On 7/2/2021 10:13 AM, Jeff Law wrote:
> >
> > This is a minor missed optimization we found with our internal port.
> >
> > Given this code:
> >
> > typedef struct {short a; short b;} T;
> >
> > extern void g1();
> >
> > void f(T s)
> > {
> >         if (s.a < 0)
> >                 g1();
> > }
> >
> >
> > "s" is passed in a register, but it's still a BLKmode object because
> > the alignment of T is smaller than the alignment that an integer of
> > the same size would have (16 bits vs 32 bits).
> >
> >
> > Because "s" is BLKmode function.c is going to store it into a stack
> > slot and we'll load it from the that slot for each reference.  So on
> > the v850 (just to pick a port that likely has the same behavior we're
> > seeing) we have this RTL from CSE2:
> >
> >
> > (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
> >                 (const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
> >         (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
> >      (expr_list:REG_DEAD (reg:SI 6 r6)
> >         (nil)))
> > (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
> > (insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
> >         (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
> >                 (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2
> > A32])) "j.c":7:5 3 {*movhi_internal}
> >      (nil))
> > (insn 9 8 10 2 (parallel [
> >             (set (reg:SI 45)
> >                 (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
> >                     (const_int 16 [0x10])))
> >             (clobber (reg:CC 32 psw))
> >         ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
> >      (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
> >         (expr_list:REG_UNUSED (reg:CC 32 psw)
> >             (nil))))
> > (insn 10 9 11 2 (parallel [
> >             (set (reg:SI 43)
> >                 (ashiftrt:SI (reg:SI 45)
> >                     (const_int 16 [0x10])))
> >             (clobber (reg:CC 32 psw))
> >         ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
> >      (expr_list:REG_DEAD (reg:SI 45)
> >         (expr_list:REG_UNUSED (reg:CC 32 psw)
> >             (nil))))
> >
> >
> > Insn 2 is the store into the stack. insn 8 is the load for s.a in the
> > conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg
> > 6) has the value we want.  After that the store at insn 2 is dead.
> > Sadly DSE never removes the store.
> >
> > The problem is RTL DSE considers a store with no MEM_EXPR as escaping,
> > which keeps the MEM live.  The lack of a MEM_EXPR is due to call to
> > change_address to twiddle the mode on the MEM for the store at insn
> > 2.  It should be safe to copy the MEM_EXPR (which should always be a
> > PARM_DECL) from the original memory to the memory returned by
> > change_address.  Doing so results in DSE1 removing the store at insn 2.
> >
> > It would be nice to remove the stack setup/teardown.   I'm not offhand
> > aware of mechanisms to remove the setup/teardown after we've already
> > allocated a slot, even if the slot is no longer used.
> >
> > Bootstrapped and regression tested on x86, though I don't think that's
> > a particularly useful test.  So I also ran it through my tester across
> > those pesky embedded targets without regressions as well.
> >
> > I didn't include a test simply because I didn't want to have an insane
> > target selector.  I guess if we really wanted a test we could look
> > after DSE1 is done and verify there aren't any MEMs left at all.
> > Willing to try that if the consensus is we want this tested.
> >
> > OK for the trunk?
> So Richi questioned if using adjust_address rather than change_address
> was sufficient to solve the problem.  It is.  I've bootstrapped &
> regression tested on x86_64 and regression tested this against the usual
> set of crosses in the tester.
>
> OK for the trunk now?

OK.

Thanks,
Richard.

> JEff
>

Reply via email to