https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124180
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'll note that we turn
return (unsigned short)((v)>(6.5535e+4) ? (6.5535e+4) : ((0.0f)>(v) ? (0.0f) :
(v)));
into
return <retval> = (float) v > 6.5535e+4 ? 65535 : (float) v < 0.0 ? 0 :
(OUT_T) v;
during early folding which is a valid optimization but which makes the
float to int conversion conditionally evaluated.
I'll also note that if-conversion cannot really recover from this, the
guard prevents the exception from the unclamped value. So if-conversion
would need to reverse the above transform. With GCC 15 we turn the code
into
v_30 = _4 + 5.0e-1;
_53 = v_30 > 6.5535e+4;
_50 = v_30 < 0.0;
_31 = (short unsigned int) v_30;
_ifc__107 = _50 ? 0 : _31;
_32 = _53 ? 65535 : _ifc__107;
which is really incorrect since (short unsigned int) v is evaluated
unconditionally.
To fully recover from the correctness fix on capable hardware implementing
a masked float to int conversion is necessary (and possible).
To mitigate the vectorization regression, also for non-mask capable hardware,
the early simplification of the conditional expressions would have to be
inhibited. This is
tree
fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
{
...
else if (TREE_CODE (arg0) == COND_EXPR)
{
tree arg01 = TREE_OPERAND (arg0, 1);
tree arg02 = TREE_OPERAND (arg0, 2);
if (! VOID_TYPE_P (TREE_TYPE (arg01)))
arg01 = fold_build1_loc (loc, code, type,
fold_convert_loc (loc,
TREE_TYPE (op0), arg01));
if (! VOID_TYPE_P (TREE_TYPE (arg02)))
arg02 = fold_build1_loc (loc, code, type,
fold_convert_loc (loc,
TREE_TYPE (op0), arg02));
tem = fold_build3_loc (loc, COND_EXPR, type, TREE_OPERAND (arg0, 0),
arg01, arg02);
/* If this was a conversion, and all we did was to move into
inside the COND_EXPR, bring it back out. But leave it if
it is a conversion from integer to integer and the
result precision is no wider than a word since such a
conversion is cheap and may be optimized away by combine,
while it couldn't if it were outside the COND_EXPR. Then return
so we don't get into an infinite recursion loop taking the
conversion out and then back in. */
if ((CONVERT_EXPR_CODE_P (code)
...
if we inhibit that and also disable PRE (which does a similarly problematic
transform) then we still fail to if-convert this when no masking is available
since phiopt only collapses the innermost clamp part:
v_35 = _5 + 5.0e-1;
if (v_35 > 6.5535e+4)
goto <bb 8>; [50.00%]
else
goto <bb 7>; [50.00%]
<bb 7> [local count: 477815112]:
_36 = v_35 < 0.0;
_37 = _36 ? 0.0 : v_35;
<bb 8> [local count: 955630224]:
# iftmp.4_38 = PHI <6.5535e+4(6), _37(7)>
_39 = (short unsigned int) iftmp.4_38;
PRE would anticipate the float to int conversion on the constant edge.
I think phiopt has the issue that the source, as written, at least
has the (min) > (a) FP comparison conditional, and that also raises
FP exceptions.
In the end it's badly written source, of course. Using isgreater here
avoids all phiopt (we seem to refuse doing the COND_EXPR max for that),
but we then can ifconvert
if (v_34 u<= 6.5535e+4)
goto <bb 4>; [50.00%]
else
goto <bb 20>; [50.00%]
<bb 20> [local count: 477815112]:
goto <bb 6>; [100.00%]
<bb 4> [local count: 477815112]:
if (v_34 u>= 0.0)
goto <bb 21>; [50.00%]
else
goto <bb 5>; [50.00%]
<bb 21> [local count: 238907556]:
goto <bb 6>; [100.00%]
<bb 5> [local count: 238907556]:
<bb 6> [local count: 955630226]:
# iftmp.4_35 = PHI <6.5535e+4(20), v_34(21), 0.0(5)>
_36 = (short unsigned int) iftmp.4_35;
to
v_31 = _5 + 5.0e-1;
_52 = v_31 u<= 6.5535e+4;
_50 = v_31 u>= 0.0;
_ifc__100 = _50 ? v_31 : 0.0;
iftmp.4_32 = _52 ? _ifc__100 : 6.5535e+4;
_33 = (short unsigned int) iftmp.4_32;
but really the original source was written w/o NaNs and w/o IEEE exceptions in
mind, so using -fno-trapping-math and probably also -ffinite-math-only would
be appropriate for it. The testcase comes from
github.com/AcademySoftwareFoundation/OpenColorIO