https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
pass_pre_slp_scalar_cleanup invokes another copy of FRE and I think this goes
wrong in there.
The .USUBC calls emitted by bitintlower1 are:
  _50 = .USUBC (0, _47, _48);
  _61 = .USUBC (0, _60, _51);
  _65 = .USUBC (0, 0, _49);
  _36 = .USUBC (_32, _33, _34);
  _40 = .USUBC (0, 0, _37);
where the first two process even and odd limbs of y from the first
.SUB_OVERFLOW,
the two second operands are initialized with
  _47 = VIEW_CONVERT_EXPR<unsigned long[8]>(y)[_45];
and
  _59 = _45 + 1;
  _60 = VIEW_CONVERT_EXPR<unsigned long[8]>(y)[_59];
where _45 is an IV going from 0 to 6 in steps of 2 and y has just y[7]
non-zero, all lower limbs zero.  The third .USUBC is the final processing of
the first .SUB_OVERFLOW
and the remaining two are from the second .SUB_OVERFLOW, we can ignore that
now.
Now, in cunroll we can see some jump threading from earlier passes:
  _50 = .USUBC (0, _47, _48);
  _16 = .USUBC (0, _17, _51);
  _74 = .USUBC (0, _73, _51);
  _61 = .USUBC (0, _60, _51);
  _65 = .USUBC (0, 0, _125);
but the _48 vs. _51 last operands clearly identify where it is coming from.
Now the pre-slp fre4 seems to have replaced the second arguments of the 3 calls
with 0s:
  _50 = .USUBC (0, _47, _48);
  _16 = .USUBC (0, 0, _51);
  _74 = .USUBC (0, 0, _51);
  _61 = .USUBC (0, 0, _51);
  _65 = .USUBC (0, 0, _125);
While that it would be correct to replace _47 with 0, because _45 iterates over
the 0, 2, 4 and 6 indexes into the array and the array is known to be 0 there
due to
  __builtin_memset (&y, 0, 56);
that is not the case for VIEW_CONVERT_EXPR<unsigned long[8]>(y)[7].
  _16 = .USUBC (0, _17, _51);
is guarded on _45 <= 3 (aka 0 or 2) and so _17 -> 0 replacement is ok.
  _74 = .USUBC (0, _73, _51);
is guarded on _45 == 4 and VIEW_CONVERT_EXPR<unsigned long[8]>(y)[5] is also
known to be 0, so _73 -> 0 is ok as well.
But in
  _59 = _45 + 1;
  _60 = VIEW_CONVERT_EXPR<unsigned long[8]>(y)[_59];
  _61 = .USUBC (0, _60, _51);
either we don't know anything, in that case we need to load, or we know that
_45 is 6
and _59 is 7 and VIEW_CONVERT_EXPR<unsigned long[8]>(y)[7] is _84, not 0.

Reply via email to