On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote: >On 01/16/2017 01:51 AM, Richard Biener wrote: >> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <l...@redhat.com> wrote: >>> >>> At one time I know I had the max_size == size test in >valid_ao_ref_for_dse. >>> But it got lost at some point. This is what caused the Ada failure. >>> >>> Technically it'd be OK for the potentially dead store to have a >variable >>> size as long as the later stores covered the entire range of the >potentially >>> dead store. I doubt this happens enough to be worth checking. >>> >>> The ppc64 big endian failures were more interesting. We had this in >the IL: >>> >>> memmove (dst, src, 0) >>> >>> The trimming code assumes that there's at least one live byte in the >store, >>> which obviously isn't the case here. The net result is we compute >an >>> incorrect trim and the copy goes wild with incorrect addresses and >lengths. >>> This is trivial to fix by validating that the store has a nonzero >length. >>> >>> I was a bit curious how often this happened in practice because such >a call >>> is trivially dead. ~80 during a bootstrap and a few dozen in the >testsuite. >>> Given how trivial it is to detect and optimize, this patch includes >removal >>> of such calls. This hunk makes the check for zero size in >>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if >we add >>> more builtin support without filtering zero size we'd regress again. >> >> Interesting - we do fold memset (..., 0) away so this means we either >> have an unfolded memset stmt in the IL before DSE. >It's actually exposed by fre3, both in the original test and in the >reduced testcase. In the reduced testcase we have this just prior to >FRE3: > >;; basic block 3, loop depth 0, count 0, freq 7326, maybe hot >;; prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) >;; pred: 2 [73.3%] (TRUE_VALUE,EXECUTABLE) > _3 = MEM[(const struct vec *)_4].m_num; > if (_3 != 0) > goto <bb 4>; [36.64%] > else > goto <bb 5>; [63.36%] >;; succ: 4 [36.6%] (TRUE_VALUE,EXECUTABLE) >;; 5 [63.4%] (FALSE_VALUE,EXECUTABLE) > >;; basic block 4, loop depth 0, count 0, freq 2684, maybe hot >;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) >;; pred: 3 [36.6%] (TRUE_VALUE,EXECUTABLE) > _6 = vec_av_set.m_vec; > _7 = _6->m_num; > _8 = _7 - _3; > _6->m_num = _8; > _9 = (long unsigned int) _8; > _10 = _9 * 4; > slot.2_11 = slot; > dest.3_12 = dest; > memmove (dest.3_12, slot.2_11, _10); >;; succ: 5 [100.0%] (FALLTHRU,EXECUTABLE) > > >_3 has the value _6->m_num. Thus _8 will have the value 0, which in >turn makes _10 have the value zero as seen in the .fre3 dump: > >;; basic block 4, loop depth 0, count 0, freq 2684, maybe hot >;; prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) >;; pred: 3 [36.6%] (TRUE_VALUE,EXECUTABLE) > _4->m_num = 0; > slot.2_11 = slot; > dest.3_12 = dest; > memmove (dest.3_12, slot.2_11, 0); > >In the full test its similar. > >I don't know if you want to try and catch this in FRE though.
Ah, I think I have patches for this since a long time in my tree... We're folding calls in a restricted way for some historical reason. >If we detect in DCE (where it makes reasonable sense) rather than DSE, >then we detect the dead mem* about 17 passes earlier and the dead >argument setup about 20 passes earlier. In the testcase I looked at, I > >didn't see additional secondary optimizations enabled, but I could >imagine cases where it might. Seems like a gcc-8 thing though. I'll give it a quick look tomorrow. Richard. >Jeff