On Fri, Jul 9, 2021 at 4:39 AM Jeff Law <j...@tachyum.com> wrote: > > > > On 7/2/2021 10:13 AM, Jeff Law wrote: > > > > This is a minor missed optimization we found with our internal port. > > > > Given this code: > > > > typedef struct {short a; short b;} T; > > > > extern void g1(); > > > > void f(T s) > > { > > if (s.a < 0) > > g1(); > > } > > > > > > "s" is passed in a register, but it's still a BLKmode object because > > the alignment of T is smaller than the alignment that an integer of > > the same size would have (16 bits vs 32 bits). > > > > > > Because "s" is BLKmode function.c is going to store it into a stack > > slot and we'll load it from the that slot for each reference. So on > > the v850 (just to pick a port that likely has the same behavior we're > > seeing) we have this RTL from CSE2: > > > > > > (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp) > > (const_int -4 [0xfffffffffffffffc])) [2 S4 A32]) > > (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal} > > (expr_list:REG_DEAD (reg:SI 6 r6) > > (nil))) > > (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG) > > (insn 8 3 9 2 (set (reg:HI 44 [ s.a ]) > > (mem/c:HI (plus:SI (reg/f:SI 34 .fp) > > (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 > > A32])) "j.c":7:5 3 {*movhi_internal} > > (nil)) > > (insn 9 8 10 2 (parallel [ > > (set (reg:SI 45) > > (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0) > > (const_int 16 [0x10]))) > > (clobber (reg:CC 32 psw)) > > ]) "j.c":7:5 94 {ashlsi3_clobber_flags} > > (expr_list:REG_DEAD (reg:HI 44 [ s.a ]) > > (expr_list:REG_UNUSED (reg:CC 32 psw) > > (nil)))) > > (insn 10 9 11 2 (parallel [ > > (set (reg:SI 43) > > (ashiftrt:SI (reg:SI 45) > > (const_int 16 [0x10]))) > > (clobber (reg:CC 32 psw)) > > ]) "j.c":7:5 104 {ashrsi3_clobber_flags} > > (expr_list:REG_DEAD (reg:SI 45) > > (expr_list:REG_UNUSED (reg:CC 32 psw) > > (nil)))) > > > > > > Insn 2 is the store into the stack. insn 8 is the load for s.a in the > > conditional. DSE1 replaces the MEM in insn 8 with (reg 6) since (reg > > 6) has the value we want. After that the store at insn 2 is dead. > > Sadly DSE never removes the store. > > > > The problem is RTL DSE considers a store with no MEM_EXPR as escaping, > > which keeps the MEM live. The lack of a MEM_EXPR is due to call to > > change_address to twiddle the mode on the MEM for the store at insn > > 2. It should be safe to copy the MEM_EXPR (which should always be a > > PARM_DECL) from the original memory to the memory returned by > > change_address. Doing so results in DSE1 removing the store at insn 2. > > > > It would be nice to remove the stack setup/teardown. I'm not offhand > > aware of mechanisms to remove the setup/teardown after we've already > > allocated a slot, even if the slot is no longer used. > > > > Bootstrapped and regression tested on x86, though I don't think that's > > a particularly useful test. So I also ran it through my tester across > > those pesky embedded targets without regressions as well. > > > > I didn't include a test simply because I didn't want to have an insane > > target selector. I guess if we really wanted a test we could look > > after DSE1 is done and verify there aren't any MEMs left at all. > > Willing to try that if the consensus is we want this tested. > > > > OK for the trunk? > So Richi questioned if using adjust_address rather than change_address > was sufficient to solve the problem. It is. I've bootstrapped & > regression tested on x86_64 and regression tested this against the usual > set of crosses in the tester. > > OK for the trunk now?
OK. Thanks, Richard. > JEff >