Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

Richard Biener Mon, 16 Jan 2017 14:36:36 -0800

On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote:
>On 01/16/2017 01:51 AM, Richard Biener wrote:
>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law <l...@redhat.com> wrote:
>>>
>>> At one time I know I had the max_size == size test in
>valid_ao_ref_for_dse.
>>> But it got lost at some point.  This is what caused the Ada failure.
>>>
>>> Technically it'd be OK for the potentially dead store to have a
>variable
>>> size as long as the later stores covered the entire range of the
>potentially
>>> dead store.  I doubt this happens enough to be worth checking.
>>>
>>> The ppc64 big endian failures were more interesting.  We had this in
>the IL:
>>>
>>> memmove (dst, src, 0)
>>>
>>> The trimming code assumes that there's at least one live byte in the
>store,
>>> which obviously isn't the case here.  The net result is we compute
>an
>>> incorrect trim and the copy goes wild with incorrect addresses and
>lengths.
>>> This is trivial to fix by validating that the store has a nonzero
>length.
>>>
>>> I was a bit curious how often this happened in practice because such
>a call
>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>testsuite.
>>> Given how trivial it is to detect and optimize, this patch includes
>removal
>>> of such calls.  This hunk makes the check for zero size in
>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>we add
>>> more builtin support without filtering zero size we'd regress again.
>>
>> Interesting - we do fold memset (..., 0) away so this means we either
>> have an unfolded memset stmt in the IL before DSE.
>It's actually exposed by fre3, both in the original test and in the 
>reduced testcase.  In the reduced testcase we have this just prior to
>FRE3:
>
>;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>;;    pred:       2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>   _3 = MEM[(const struct vec *)_4].m_num;
>   if (_3 != 0)
>     goto <bb 4>; [36.64%]
>   else
>     goto <bb 5>; [63.36%]
>;;    succ:       4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>;;                5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>
>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>   _6 = vec_av_set.m_vec;
>   _7 = _6->m_num;
>   _8 = _7 - _3;
>   _6->m_num = _8;
>   _9 = (long unsigned int) _8;
>   _10 = _9 * 4;
>   slot.2_11 = slot;
>   dest.3_12 = dest;
>   memmove (dest.3_12, slot.2_11, _10);
>;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)
>
>
>_3 has the value _6->m_num.  Thus _8 will have the value 0, which in 
>turn makes _10 have the value zero as seen in the .fre3 dump:
>
>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>;;    prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>;;    pred:       3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>   _4->m_num = 0;
>   slot.2_11 = slot;
>   dest.3_12 = dest;
>   memmove (dest.3_12, slot.2_11, 0);
>
>In the full test its similar.
>
>I don't know if you want to try and catch this in FRE though.


Ah, I think I have patches for this since a long time in my tree...  We're 
folding calls in a restricted way for some historical reason.

>If we detect in DCE (where it makes reasonable sense) rather than DSE, 
>then we detect the dead mem* about 17 passes earlier and the dead 
>argument setup about 20 passes earlier.  In the testcase I looked at, I
>
>didn't see additional secondary optimizations enabled, but I could 
>imagine cases where it might.  Seems like a gcc-8 thing though.

I'll give it a quick look tomorrow.

Richard.

>Jeff

Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

Reply via email to