Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

Michael Meissner Tue, 20 Sep 2016 14:03:45 -0700

On Tue, Sep 20, 2016 at 01:19:07PM +0100, Tamar Christina wrote:
> On 19/09/16 23:16, Michael Meissner wrote:
> >On Mon, Sep 12, 2016 at 04:19:32PM +0000, Tamar Christina wrote:
> >>Hi All,
> >>
> >>This patch adds an optimized route to the fpclassify builtin
> >>for floating point numbers which are similar to IEEE-754 in format.
> >>
> >>The goal is to make it faster by:
> >>1. Trying to determine the most common case first
> >>    (e.g. the float is a Normal number) and then the
> >>    rest. The amount of code generated at -O2 are
> >>    about the same +/- 1 instruction, but the code
> >>    is much better.
> >>2. Using integer operation in the optimized path.
> >>
> >>At a high level, the optimized path uses integer operations
> >>to perform the following:
> >>
> >>   if (exponent bits aren't all set or unset)
> >>      return Normal;
> >>   else if (no bits are set on the number after masking out
> >>        sign bits then)
> >>      return Zero;
> >>   else if (exponent has no bits set)
> >>      return Subnormal;
> >>   else if (mantissa has no bits set)
> >>      return Infinite;
> >>   else
> >>      return NaN;
> >I haven't looked at fpclassify.  I assume we can define a backend insn to do
> >the right thing?  One of the things we've noticed over the years with the
> >PowerPC is that it can be rather expensive to move things from the floating
> >point/vector unit to the integer registers and vice versa.  This is
> >particularly true if you having to do the transfer via the memory unit via
> >stores and loads of different sizes.
> >
> Hmm, what do you mean with the right thing? Do you mean never to use the
> integer version?


The forthcoming PowerPC with ISA 3.0 (power9), we have different ways to do
classification within the floating point unit.

For example, there is the XSTSTDCDP instruction that can set a condition code
register to whether the value is 0, NaN, Infinity, Denormal.  We might come up
with a clever set of tests to use 4 of these instructions to return the
appropriate FP_<xxx>.

Even if we want to do it by looking at the exponent, ISA 3.0 defines
instructions like XSXEXPDP that extracts the exponent from a double precision
value and returns it in a GPR register.

> If so then no, it currently determines it based on the format.
> I could potentially add a hook to allow backends to opt-in/out if
> there's a concern this might be slower.

It would be better to have a fpclassify<mode>2 pattern, and if it isn't
defined, then do the machine independent processing.  That is the way it is
done elsewhere.

> Though is the move that much slower that it negates the benefits we
> should get from not having to do
> 4 branches in the normal case?

It depends.  We have a lot of other stuff for ISA 3.0 on our plates, and
truthfully, we won't be able to answer the question about performance until we
get real hardware, but I would prefer not to be locked into an existing
implementation.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

Reply via email to