Richard Biener <richard.guent...@gmail.com> writes:
> On Fri, Jul 20, 2018 at 12:22 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> We couldn't vectorise:
>>
>>   for (int j = 0; j < n; ++j)
>>     {
>>       for (int i = 0; i < 16; ++i)
>>         a[i] = (b[i] + c[i]) >> 1;
>>       a += step;
>>       b += step;
>>       c += step;
>>     }
>>
>> at -O3 because cunrolli unrolled the inner loop and SLP couldn't handle
>> AVG_FLOOR patterns (see also PR86504).  The problem was some overly
>> strict checking of pattern statements compared to normal statements
>> in vect_get_and_check_slp_defs:
>>
>>           switch (gimple_code (def_stmt))
>>             {
>>             case GIMPLE_PHI:
>>             case GIMPLE_ASSIGN:
>>               break;
>>
>>             default:
>>               if (dump_enabled_p ())
>>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                                  "unsupported defining stmt:\n");
>>               return -1;
>>             }
>>
>> The easy fix would have been to add GIMPLE_CALL to the switch,
>> but I don't think the switch is doing anything useful.  We only create
>> pattern statements that the rest of the vectoriser can handle, and code
>> in this function and elsewhere check whether SLP is possible.
>>
>> I'm also not sure why:
>>
>>           if (!first && !oprnd_info->first_pattern
>>               /* Allow different pattern state for the defs of the
>>                  first stmt in reduction chains.  */
>>               && (oprnd_info->first_dt != vect_reduction_def
>>
>> is necessary.  All that should matter is that the statements in the
>> node are "similar enough".  It turned out to be quite hard to find a
>> convincing example that used a mixture of pattern and non-pattern
>> statements, so bb-slp-pow-1.c is the best I could come up with.
>> But it does show that the combination of "xi * xi" statements and
>> "pow (xj, 2) -> xj * xj" patterns are handled correctly.
>>
>> The patch therefore just removes the whole if block.
>>
>> The loop also needed commutative swapping to be extended to at least
>> AVG_FLOOR.
>>
>> This gives +3.9% on 525.x264_r at -O3.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
>> and x86_64-linux-gnu.  OK to install?
>
> Always nice to see seemingly dead code go.  OK if you can still
> build SPEC with this change and pass a test run.
>
> At least I _do_ seem to remember having "issues" in this area...

Thanks.  I've now applied it after testing SPEC2006 and SPEC2017 on
aarch64-linux-gnu and x86_64-linux-gnu.  I also ran it through our
internal SVE benchmark set, which includes those two and various
other things.

Richard

Reply via email to