https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> --- pass_pre_slp_scalar_cleanup invokes another copy of FRE and I think this goes wrong in there. The .USUBC calls emitted by bitintlower1 are: _50 = .USUBC (0, _47, _48); _61 = .USUBC (0, _60, _51); _65 = .USUBC (0, 0, _49); _36 = .USUBC (_32, _33, _34); _40 = .USUBC (0, 0, _37); where the first two process even and odd limbs of y from the first .SUB_OVERFLOW, the two second operands are initialized with _47 = VIEW_CONVERT_EXPR<unsigned long[8]>(y)[_45]; and _59 = _45 + 1; _60 = VIEW_CONVERT_EXPR<unsigned long[8]>(y)[_59]; where _45 is an IV going from 0 to 6 in steps of 2 and y has just y[7] non-zero, all lower limbs zero. The third .USUBC is the final processing of the first .SUB_OVERFLOW and the remaining two are from the second .SUB_OVERFLOW, we can ignore that now. Now, in cunroll we can see some jump threading from earlier passes: _50 = .USUBC (0, _47, _48); _16 = .USUBC (0, _17, _51); _74 = .USUBC (0, _73, _51); _61 = .USUBC (0, _60, _51); _65 = .USUBC (0, 0, _125); but the _48 vs. _51 last operands clearly identify where it is coming from. Now the pre-slp fre4 seems to have replaced the second arguments of the 3 calls with 0s: _50 = .USUBC (0, _47, _48); _16 = .USUBC (0, 0, _51); _74 = .USUBC (0, 0, _51); _61 = .USUBC (0, 0, _51); _65 = .USUBC (0, 0, _125); While that it would be correct to replace _47 with 0, because _45 iterates over the 0, 2, 4 and 6 indexes into the array and the array is known to be 0 there due to __builtin_memset (&y, 0, 56); that is not the case for VIEW_CONVERT_EXPR<unsigned long[8]>(y)[7]. _16 = .USUBC (0, _17, _51); is guarded on _45 <= 3 (aka 0 or 2) and so _17 -> 0 replacement is ok. _74 = .USUBC (0, _73, _51); is guarded on _45 == 4 and VIEW_CONVERT_EXPR<unsigned long[8]>(y)[5] is also known to be 0, so _73 -> 0 is ok as well. But in _59 = _45 + 1; _60 = VIEW_CONVERT_EXPR<unsigned long[8]>(y)[_59]; _61 = .USUBC (0, _60, _51); either we don't know anything, in that case we need to load, or we know that _45 is 6 and _59 is 7 and VIEW_CONVERT_EXPR<unsigned long[8]>(y)[7] is _84, not 0.