On Tue, Aug 9, 2016 at 4:43 PM, Bin.Cheng <[email protected]> wrote:
> On Sat, Aug 6, 2016 at 9:20 PM, Andreas Schwab <[email protected]> wrote:
>> On Mi, Jul 13 2016, "Bin.Cheng" <[email protected]> wrote:
>>
>>> Patch re-tested/applied on trunk as r238301.
>>
>> This breaks gcc.dg/vect/vect-117.c on powerpc.
> Hi Andreas,
> Sorry for the inconvenience, I will have a look.
Looks like the patch exposed another latent issue in vectorizer.
Before patching, the loop is vectorized under below alias check
condition:
<bb 3>:
_7 = (int[5] *) ivtmp.88_168;
_20 = (unsigned int) n_15(D);
_12 = _20 > 3;
_180 = ivtmp.85_166 + 32;
_181 = (ssizetype) _180;
_182 = ivtmp.85_166 + 20;
_183 = (ssizetype) _182;
_32 = _181 <= _183;
_184 = ivtmp.85_166 + 36;
_185 = (ssizetype) _184;
_38 = (ssizetype) ivtmp.85_166;
_39 = _38 >= _185;
_40 = _32 | _39;
_41 = _12 & _40
if (_41 != 0)
goto <bb 4>;
else
goto <bb 13>;
Note the condition _40 = (_32 | _39) can never be true thus vectorized
loop will never be executed in practice. This patch only bypasses the
known to be false condition at compile time.
After investigation, I believe root causes is in unalignment handling,
especially vectorizer tries to optimize unaligned load by scheme
dr_explicit_realign_optimized on powerpc.
The loop itself should be vectorized successfully:
static int a[N][N] = {{ 1, 2, 3, 4, 5},
{ 6, 7, 8, 9,10},
{11,12,13,14,15},
{16,17,18,19,20},
{21,22,23,24,25}};
volatile int foo;
__attribute__ ((noinline))
int main1 (int A[N][N], int n)
{
int i,j;
/* vectorizable */
for (i = 1; i < N; i++)
{
for (j = 0; j < n; j++)
{
A[i][j] = A[i-1][j] + A[i][j];
}
}
return 0;
}
But vectorizer tries to realign A[i-1][j] using realign_load, thus the
real memory address accessed is different to the original address in
each iteration. Function vect_vfa_segment_size takes this into
consideration and generates false alias check condition. IMO, The
unalignment support scheme should be improved in a way that allows
more vectorization. In other words, it should fall back to
dr_unaligned_supported to enable vectorization, rather than choose
dr_explicit_realign_optimized which blocks vectorization.
I will file a PR for this.
Thanks,
bin
>
> Thanks,
> bin
>>
>> Andreas.
>>
>> --
>> Andreas Schwab, [email protected]
>> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
>> "And now for something completely different."