Re: [pushed] aarch64: Adjust tests after fix for PR102659

Richard Sandiford via Gcc-patches Thu, 03 Feb 2022 07:01:50 -0800

Richard Biener <richard.guent...@gmail.com> writes:
> On Thu, Feb 3, 2022 at 11:52 AM Richard Sandiford via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> After the fix for PR102659, the vectoriser can no longer group
>> conditional accesses of the form:
>>
>>   for (int i = 0; i < n; ++i)
>>     if (...)
>>       ...a[i * 2] + a[i * 2 + 1]...;
>>
>> on LP64 targets.  It has to treat them as two independent
>> gathers instead.
>
> Hmm, that's unfortunate.  Can you file an enhancement bugreport?


OK, filed as PR104368.

> How does using intptr_t help?  i * 2 can still overflow with large n,
> so can it with 'int' on ILP32.  So I guess this is the old issue
> of transforming (uint64)(i * 2 + 1) to (uint64)(i*2) + 1UL?

That does happen, but I'm not sure that it's the main problem.
SCEV analysis seems to fail for the a[i * 2] access too.

With ints the &a[i * 2] calculation is:

  _45 = (unsigned int) i_26;
  _46 = _45 * 2;
  _5 = (int) _46;
  _6 = (long unsigned int) _5;
  _7 = _6 * 4;
  _48 = _47 + _7;

and the &a[i * 2 + 1] calculation is:

  _10 = _6 + 1;
  _11 = _10 * 4;
  _51 = _11 + _47;
 
With intptr_ts the &a[i * 2] calculation is:

  i.0_1 = (long unsigned int) i_23;
  _5 = i.0_1 * 8;
  _40 = _39 + _5;

and the &a[i * 2 + 1] calculation is:

  _8 = _5 + 4;
  _43 = _8 + _39;

which looks correct.

If the intptr_t i * 2 wraps then a &a[(uintptr_t)i * 2] IV will still
behave correctly, so the {a, +, 8} SCEV still seems accurate.  The int
i * 2 would instead wrap at 32 bits, so &a[(unsigned)i * 2] isn't
linear in any meaningful sense.

I don't know if the wrapping intptr_t SCEV leads to well-formed gimple
though.  Are pointer IVs assumed not to overflow?  If so, I guess we
might still be introducing UB for some intptr_t cases (although not
this one AFAICT, since any wrapping cases would be UB in the source too).

Thanks,
Richard

Re: [pushed] aarch64: Adjust tests after fix for PR102659

Reply via email to