On 09/12/2016 10:19 AM, Tamar Christina wrote:
This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.
The goal is to make it faster by:
1. Trying to determine the most common case first
(e.g. the float is a Normal number) and then the
rest. The amount of code generated at -O2 are
about the same ± 1 instruction, but the code
is much better.
2. Using integer operation in the optimized path.
At a high level, the optimized path uses integer operations
to perform the following:
if (exponent bits aren't all set or unset)
else if (no bits are set on the number after masking out
sign bits then)
else if (exponent has no bits set)
else if (mantissa has no bits set)
In case the optimization can't be applied the old
implementation is used as a fall-back.
A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).
To determine this IEEE likeness a new boolean was added to real_format.
Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 uses it's own implementation other than
the fpclassify builtin.
As an example, Aarch64 now generates for classification of doubles:
fmov x1, d0
mov w0, 7
sbfx x2, x1, 52, 11
add w3, w2, 1
tst w3, 0x07FE
mov w0, 13
tst x1, 0x7fffffffffffffff
mov w0, 11
tbz x2, 0, .L1
tst x1, 0xfffffffffffff
mov w0, 3
mov w1, 5
csel w0, w0, w1, ne
No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 31.3%
Ok for trunk?
PS. I don't have commit rights so if OK can someone apply the patch for me.
2016-08-25 Tamar Christina <tamar.christ...@arm.com>
Wilco Dijkstra <wilco.dijks...@arm.com>
* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
* gcc/real.h (real_format): Added is_ieee_compatible field.
* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
diff --git a/gcc/builtins.c b/gcc/builtins.c
@@ -7947,10 +7947,8 @@ static tree
fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
- arg, type, res, tmp;
+ arg, type, res;
- REAL_VALUE_TYPE r;
- char buf;
/* Verify the required arguments in the original call. */
if (nargs != 6
@@ -7970,14 +7968,143 @@ fold_builtin_fpclassify (location_t loc, tree *args,
arg = args;
type = TREE_TYPE (arg);
mode = TYPE_MODE (type);
- arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
+ const real_format *format = REAL_MODE_FORMAT (mode);
+ For IEEE 754 types:
+ fpclassify (x) ->
+ !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
+ ? (x & sign_mask == 0 ? FP_ZERO :
+ (exp & exp_mask == exp_mask
+ ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
+ fpclassify (x) ->
+ isnan (x) ? FP_NAN :
+ (fabs (x) == Inf ? FP_INFINITE :
+ (fabs (x) >= DBL_MIN ? FP_NORMAL :
+ (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
+ /* Check if the number that is being classified is close enough to IEEE 754
+ format to be able to go in the early exit code. */
+ if (format->is_binary_ieee_compatible)
+ gcc_assert (format->b == 2);
+ const tree int_type = integer_type_node;
+ const int exp_bits = (GET_MODE_SIZE (mode) * BITS_PER_UNIT) - format->p;
+ const int exp_mask = (1 << exp_bits) - 1;
+ tree exp, specials, exp_bitfield,
+ const_arg0, const_arg1, const0, const1,
+ not_sign_mask, zero_check, mantissa_mask,
+ mantissa_any_set, exp_lsb_set, mask_check;
+ tree int_arg_type, int_arg;
Style nit. Just use
tree exp, specials, exp_bitfield;
tree const_arg0, const_arg1, etc etc.
Doesn't this make ARG addressable? Which in turn means ARG won't be
exposed to the gimple/ssa optimizers. Or is it the case that when
fpclassify is used its argument is already in memory (and thus addressable?)
+ /* Re-interpret the float as an unsigned integer type
+ with equal precision. */
+ int_arg_type = build_nonstandard_integer_type (TYPE_PRECISION (type), 0);
+ int_arg = fold_build1_loc (loc, INDIRECT_REF, int_arg_type,
+ fold_build1_loc (loc, NOP_EXPR,
+ build_pointer_type (int_arg_type),
+ fold_build1_loc (loc, ADDR_EXPR,
+ build_pointer_type (type), arg)));
Formatting nits. When you have to wrap a call, the arguments are
formatted like this
+ /* ~(1 << location_sign_bit).
+ This creates a mask that can be used to mask out the sign bit. */
+ not_sign_mask = fold_build1_loc (loc, BIT_NOT_EXPR, int_arg_type,
+ fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
+ build_int_cst (int_arg_type, format->signbit_rw)));
foo (arg, arg, arg, ...
Given you've got calls to fold_build2_loc, build_int_cst, etc embedded
inside other calls to fold_build2_loc, I'd just create some temporaries
to hold the results of the inner calls. That'll clean up the formatting
So this implies you're running on generic, not gimple, right? Otherwise
you can't generate these kinds of expressions.
+ exp, const1));
+ /* Combine the values together. */
+ specials = fold_build3_loc (loc, COND_EXPR, int_type, zero_check,
+ fold_build3_loc (loc, COND_EXPR, int_type, exp_lsb_set,
+ fold_build3_loc (loc, COND_EXPR, int_type, mantissa_any_set,
+ HONOR_NANS (mode) ? fp_nan : fp_normal,
+ HONOR_INFINITIES (mode) ? fp_infinite : fp_normal),
I think Joseph has already commented on the contents of the initializer
and a few more cases were we can use the optimized paths.
diff --git a/gcc/real.h b/gcc/real.h
@@ -161,6 +161,15 @@ struct real_format
+ /* This flag indicates whether the format can be used in the optimized
+ code paths for the __builtin_fpclassify function and friends.
+ The format has to have the same NaN and INF representation as normal
+ IEEE floats (e.g. exp must have all bits set), most significant bit must
+ sign bit, followed by exp bits of at most 32 bits. Lastly the floating
+ point number must be representable as an integer. The base of the number
+ also must be base 2. */
+ bool is_binary_ieee_compatible;
const char *name;
However, I do have a general question. There are some targets which
have FPUs that are basically IEEE, but don't support certain IEEE
features like NaNs, denorms, etc.
Presumably all that's needed is for those targets to define a hook to
describe which checks will always be false and you can check the hook's
return value. Right?
Can you please include some tests to verify you're getting the initial
code generation you want? Ideally there'd be execution tests too where
you generate one of the special nodes, then call the __builtin and
verify that you get the expected results back. The latter in particular
are key since it'll allow us to catch problems much earlier across the
wide variety of targets GCC supports.
I think you already had plans to post an updated patch. Please include
the fixes noted above in that update.
And just to be clear, I like where this is going, I just think we're
going to need a couple iterations to iron everything out.