https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883
--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 18 Jun 2019, law at redhat dot com wrote: > slow () > { > struct C D.25898; > struct C D.29462; > > ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe > hot > ;; prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED) > ;; pred: ENTRY [always] count:1073741824 (estimated locally) > (FALLTHRU,EXECUTABLE) > D.25898.a = {}; > D.29462 = D.25898; > D.25898 ={v} {CLOBBER}; > return D.29462; > ;; succ: EXIT [always] count:1073741824 (estimated locally) > > } > > WHich still isn't sufficient to get good code. > > I'm not really sure what you want DSE to do here Richi :-) I observed that D.26322 = {}; D.26322.a = {}; looks like that the later store is dead (a C testcase showing actual layout might be nice here). Of course DSE doesn't work this way around but trimming might be able to trim the second store instead of the first (to nothing)? I also noticed that MEM[(struct C *)&D.26322 + 7B] = {}; D.26322.a = {}; here the first store is at offset 7 which will result in unaligned and or small stores. DSE doesn't seem to exploit the fact that we do not need to preserve the stores into the padding (in fact we do not expand that way I think). Given that -fno-tree-dse produces nearly optimal code (well, RTL manages to clean up all the useless stuff) some of the above might help.