From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf
of Tamar Christina <tamar.christ...@arm.com>
Sent: Friday, September 30, 2016 2:22:35 PM
To: GCC Patches
Cc: nd; Richard Earnshaw; Wilco Dijkstra; ja...@redhat.com; Joseph Myers;
Michael Meissner; rguent...@suse.de; Moritz Klammler; Andrew Pinski;
Subject: [PATCHv2][GCC] Optimise the fpclassify builtin to perform integer
operations when possible
This is v2 of the patch which adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.
I have addressed most comments from everyone except for two things:
1) Providing a back-end hook to override the functionality. While certainly
possible the current fpclassify doesn't provide this either. So I'd like to
treat it as an enhancement rather than an issue.
2) Doing it in a lowering phase. If the general consensus is that this is the
path the patch must take then I'd be happy to reconsider. However at this
this patch does not seem to produce worse code than what there was before.
The goal is to make it faster by:
1. Trying to determine the most common case first
(e.g. the float is a Normal number) and then the
rest. The amount of code generated at -O2 are
about the same +/- 1 instruction, but the code
is much better.
2. Using integer operation in the optimized path.
At a high level, the optimized path uses integer operations
to perform the following:
if (exponent bits aren't all set or unset)
else if (no bits are set on the number after masking out
sign bits then)
else if (exponent has no bits set)
else if (mantissa has no bits set)
In case the optimization can't be applied the old
implementation is used as a fall-back.
A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).
To determine this IEEE likeness a new boolean was added to real_format.
As an example, Aarch64 now generates for classification of doubles:
fmov x1, d0
mov w0, 7
sbfx x2, x1, 52, 11
add w3, w2, 1
tst w3, 0x07FE
mov w0, 13
tst x1, 0x7fffffffffffffff
mov w0, 11
tbz x2, 0, .L1
tst x1, 0xfffffffffffff
mov w0, 3
mov w1, 5
csel w0, w0, w1, ne
No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 42.5%
performance gain on Aarch64.
Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 also has no regressions and modest gains (3%).
Ok for trunk?
2016-08-25 Tamar Christina <tamar.christ...@arm.com>
Wilco Dijkstra <wilco.dijks...@arm.com>
* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
* gcc/real.h (real_format): Added is_ieee_compatible field.
* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
2016-09-27 Tamar Christina <tamar.christ...@arm.com>
* gcc.target/aarch64/builtin-fpclassify.c: New codegen test.