https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97744

--- Comment #4 from Kewen Lin <linkw at gcc dot gnu.org> ---
The additional pass fre4 run triggers this, to disable fre4 can make it pass
(but to disable dse3 can't separately, so it's unrelated), further narrowing
down shows fre4 on the function MG3XDEMO is responsible. By checking the
differences in the optimized files, some FMA computation order differences
attracted me. eg:

original:
  <bb 33> [local count: 4095816269717]:
  # D__I_lsm0.2159_1324 = PHI <_269(33), _1322(32)>
  # D_lsm0.2160_1321 = PHI <_382(33), _1313(32)>
  # D_lsm0.2162_1312 = PHI <_418(33), _1304(32)>
  # doloop.2199_1211 = PHI <doloop.2199_1210(33), doloop.2199_1209(32)>
  # ivtmp.2221_1207 = PHI <ivtmp.2221_1206(33), ivtmp.2263_1126(32)>
  # ivtmp.2232_1201 = PHI <ivtmp.2232_1200(33), 0(32)>
  _827 = ivtmp.2266_1109 + 24;
  _826 = (real(kind=8) *) _827;

  ...
  _293 = MEM <real(kind=8)> [(real(kind=8)[0:D.3022] *)_816 + ivtmp.2232_1201 *
1];
  _294 = _288 + _293;
  _295 = ((_294));
  _296 = _295 * 2.5e-1;
  _15 = .FMA (_263, 5.0e-1, _296);
  _815 = ivtmp.2270_1092 + 32;
  _814 = (real(kind=8) *) _815;
  ...
  _362 = _418 + D_lsm0.2162_1312;
  _363 = ((_362));
  _12 = .FMA (_363, 6.25e-2, _15);
  _370 = .FMA (_337, 1.25e-1, _12);
  _1163 = (void *) ivtmp.2221_1207;
  MEM <real(kind=8)> [(real(kind=8)[0:D.3025] *)_1163 + 16B] = _370;

vs. with culprit commit:

  <bb 33> [local count: 4095816269717]:
  # D__I_lsm0.2159_1324 = PHI <_269(33), _1322(32)>
  # D_lsm0.2160_1321 = PHI <_382(33), _1313(32)>
  # D_lsm0.2162_1312 = PHI <_418(33), _1304(32)>
  # doloop.2200_11 = PHI <doloop.2200_1333(33), doloop.2200_1329(32)>
  # ivtmp.2222_213 = PHI <ivtmp.2222_426(33), ivtmp.2264_1206(32)>
  # ivtmp.2233_1325 = PHI <ivtmp.2233_1413(33), 0(32)>
  _952 = ivtmp.2267_1189 + 24;
  _951 = (real(kind=8) *) _952;
  ...
  _293 = MEM <real(kind=8)> [(real(kind=8)[0:D.3022] *)_941 + ivtmp.2233_1325 *
1];
  _365 = _290 + _293;
  _294 = _365 + D__I_lsm0.2159_1324;
  _295 = ((_294));
...
  _362 = _418 + D_lsm0.2162_1312;
  _363 = ((_362));
  _364 = _363 * 6.25e-2;
  _558 = .FMA (_263, 5.0e-1, _364);
  _12 = .FMA (_295, 2.5e-1, _558);
  _370 = .FMA (_337, 1.25e-1, _12);
  _1265 = (void *) ivtmp.2222_213;
  MEM <real(kind=8)> [(real(kind=8)[0:D.3025] *)_1265 + 16B] = _370;

So I tried to disable pass widening_mul, it can pass and then I further
narrowed down to convert_mult_to_fma.

So the commit causes FMA computation order changes and triggered the final
comparison failure.  By bisecting on one particular FMA transform, it comes to
the one in PSINV which is called by MG3XDEMO for several times.

  _896 = _8 * _902;
  _240 = _637 + _896;

vs.

  _240 = .FMA (_8,_902, _637);

Excepting for Seurer mentioned options which make it pass, as
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html, the option
-ffp-contract=off which is able to disable FMA optimization can also make this
pass.

Besides, searching bugs shows this mgrid is well known to be with too small
absolute tolerance such as PR35418.

Reply via email to