https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124180

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'll note that we turn

return (unsigned short)((v)>(6.5535e+4) ? (6.5535e+4) : ((0.0f)>(v) ? (0.0f) :
(v)));

into

  return <retval> = (float) v > 6.5535e+4 ? 65535 : (float) v < 0.0 ? 0 :
(OUT_T) v;

during early folding which is a valid optimization but which makes the
float to int conversion conditionally evaluated.

I'll also note that if-conversion cannot really recover from this, the
guard prevents the exception from the unclamped value.  So if-conversion
would need to reverse the above transform.  With GCC 15 we turn the code
into

  v_30 = _4 + 5.0e-1;
  _53 = v_30 > 6.5535e+4;
  _50 = v_30 < 0.0;
  _31 = (short unsigned int) v_30;
  _ifc__107 = _50 ? 0 : _31;
  _32 = _53 ? 65535 : _ifc__107;

which is really incorrect since (short unsigned int) v is evaluated
unconditionally.

To fully recover from the correctness fix on capable hardware implementing
a masked float to int conversion is necessary (and possible).

To mitigate the vectorization regression, also for non-mask capable hardware,
the early simplification of the conditional expressions would have to be
inhibited.  This is

tree
fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
{
...
      else if (TREE_CODE (arg0) == COND_EXPR)
        {
          tree arg01 = TREE_OPERAND (arg0, 1);
          tree arg02 = TREE_OPERAND (arg0, 2);
          if (! VOID_TYPE_P (TREE_TYPE (arg01)))
            arg01 = fold_build1_loc (loc, code, type,
                                 fold_convert_loc (loc,
                                                   TREE_TYPE (op0), arg01));
          if (! VOID_TYPE_P (TREE_TYPE (arg02)))
            arg02 = fold_build1_loc (loc, code, type,
                                 fold_convert_loc (loc,
                                                   TREE_TYPE (op0), arg02));
          tem = fold_build3_loc (loc, COND_EXPR, type, TREE_OPERAND (arg0, 0),
                             arg01, arg02);

          /* If this was a conversion, and all we did was to move into
             inside the COND_EXPR, bring it back out.  But leave it if
             it is a conversion from integer to integer and the
             result precision is no wider than a word since such a
             conversion is cheap and may be optimized away by combine,
             while it couldn't if it were outside the COND_EXPR.  Then return
             so we don't get into an infinite recursion loop taking the
             conversion out and then back in.  */

          if ((CONVERT_EXPR_CODE_P (code)
...

if we inhibit that and also disable PRE (which does a similarly problematic
transform) then we still fail to if-convert this when no masking is available
since phiopt only collapses the innermost clamp part:

  v_35 = _5 + 5.0e-1;
  if (v_35 > 6.5535e+4)
    goto <bb 8>; [50.00%]
  else
    goto <bb 7>; [50.00%]

  <bb 7> [local count: 477815112]:
  _36 = v_35 < 0.0;
  _37 = _36 ? 0.0 : v_35;

  <bb 8> [local count: 955630224]:
  # iftmp.4_38 = PHI <6.5535e+4(6), _37(7)>
  _39 = (short unsigned int) iftmp.4_38;

PRE would anticipate the float to int conversion on the constant edge.
I think phiopt has the issue that the source, as written, at least
has the (min) > (a) FP comparison conditional, and that also raises
FP exceptions.

In the end it's badly written source, of course.  Using isgreater here
avoids all phiopt (we seem to refuse doing the COND_EXPR max for that),
but we then can ifconvert

  if (v_34 u<= 6.5535e+4)
    goto <bb 4>; [50.00%]
  else
    goto <bb 20>; [50.00%]

  <bb 20> [local count: 477815112]:
  goto <bb 6>; [100.00%]

  <bb 4> [local count: 477815112]:
  if (v_34 u>= 0.0)
    goto <bb 21>; [50.00%]
  else
    goto <bb 5>; [50.00%]

  <bb 21> [local count: 238907556]:
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 238907556]:

  <bb 6> [local count: 955630226]:
  # iftmp.4_35 = PHI <6.5535e+4(20), v_34(21), 0.0(5)>
  _36 = (short unsigned int) iftmp.4_35;

to

  v_31 = _5 + 5.0e-1;
  _52 = v_31 u<= 6.5535e+4;
  _50 = v_31 u>= 0.0;
  _ifc__100 = _50 ? v_31 : 0.0;
  iftmp.4_32 = _52 ? _ifc__100 : 6.5535e+4;
  _33 = (short unsigned int) iftmp.4_32;


but really the original source was written w/o NaNs and w/o IEEE exceptions in
mind, so using -fno-trapping-math and probably also -ffinite-math-only would
be appropriate for it.  The testcase comes from
github.com/AcademySoftwareFoundation/OpenColorIO

Reply via email to