https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326
--- Comment #9 from luoxhu at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #6)
> (In reply to luoxhu from comment #4)
> > float foo(float f, float x, float y) {
> > return (fabs(f)*x+y);
> > }
> >
> > the input of fabs is float type, so use fabsf is enough here, drafted a
> > patch to avoid double promotion when generating gimple if fabs could be
> > replaced by fabsf as argument[0] is float type.
>
> what about adding something to match.pd for:
> ABS<(float_convert)f> into (float_convert)ABS<f>
> This is only valid prompting and not reducing the precision.
Thanks, this is already implemented in fold-const.c, though not using match.pd
and fabsf really. fabs will always convert arguments to double type first in
front-end. And there are 3 kind of cases for this issue:
1) "return fabs(x);"
tree
fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
{
...
case ABS_EXPR:
/* Convert fabs((double)float) into (double)fabsf(float). */
if (TREE_CODE (arg0) == NOP_EXPR
&& TREE_CODE (type) == REAL_TYPE)
{
tree targ0 = strip_float_extensions (arg0);
if (targ0 != arg0)
return fold_convert_loc (loc, type,
fold_build1_loc (loc, ABS_EXPR,
TREE_TYPE (targ0),
targ0));
}
return NULL_TREE;
...
}
This piece of code could convert the code from "(float)fabs((double)x)" to
"(float)(double)(float)fabs(x)", then match.pd could remove the useless
convert.
2) "return fabs(x)*y;"
Frontend will generate "(float) (fabs((double) x) * (double) y)" expression
first,
then fold-const.c:fold_unary_loc will Convert fabs((double)float) into
(double)fabsf(float) and get "(float)((double)fabs(x) * (double)y)", finally,
match.pd will convert (outertype)((innertype0)a+(innertype1)b) into
((newtype)a+(newtype)b) to remove the double conversion.
3)"return fabs(x)*y + z;"
Frontend produces: (float) ((fabs((double) float) * (double) y) + (double z))
So what we need here is to match the MUL&ADD in match.pd as followed, any
comments?
+(simplify (convert (plus (mult (convert@3 (abs @0)) (convert@4 @1)) (convert@5
@2)))
+ (if (( flag_unsafe_math_optimizations
+ && types_match (type, float_type_node)
+ && types_match (TREE_TYPE(@0), float_type_node)
+ && types_match (TREE_TYPE(@1), float_type_node)
+ && types_match (TREE_TYPE(@2), float_type_node)
+ && element_precision (TREE_TYPE(@3)) > element_precision (TREE_TYPE
(@0))
+ && element_precision (TREE_TYPE(@4)) > element_precision (TREE_TYPE
(@1))
+ && element_precision (TREE_TYPE(@5)) > element_precision (TREE_TYPE
(@2))
+ && ! HONOR_NANS (type)
+ && ! HONOR_INFINITIES (type)))
+ (plus (mult (abs @0) @1) @2) ))
+
1) and 2) won't generate double conversion, only 3) has frsp in fast-math mode,
and it could be removed by above pattern.
PS: convert_to_real_1 seems to me not quite related here? It converts
(float)sqrt((double)x) where x is float into sqrtf(x), but with recursive call
to convert_to_real_1 and build_call_expr with new mathfn_built_in, I suppose it
a bit complicated to move them to match.pd?
The optimization should be under fast-math mode, is
flag_unsafe_math_optimizations enough to guard them?