arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation and
a Newton-Raphson step, respectively.
There are ARMv8 implementations where this is faster than using fdiv and rsqrt.
It runs three steps for double and two steps for float to achieve the needed 
precision.

There is one caveat and open question.
Since -ffast-math enables flush to zero intermediate values between 
approximation steps
will be flushed to zero if they are denormal.
E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
The test cases pass, but it is unclear to me whether this is expected behavior 
with -ffast-math.

The patch applies to commit:
svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470

Please consider including this patch.
Thank you and best regards,
Benedikt Huber

Benedikt Huber (1):
  2015-06-15  Benedikt Huber  <benedikt.hu...@theobroma-systems.com>

 gcc/ChangeLog                            |   9 +++
 gcc/config/aarch64/aarch64-builtins.c    |  60 ++++++++++++++++
 gcc/config/aarch64/aarch64-protos.h      |   2 +
 gcc/config/aarch64/aarch64-simd.md       |  27 ++++++++
 gcc/config/aarch64/aarch64.c             |  63 +++++++++++++++++
 gcc/config/aarch64/aarch64.md            |   3 +
 gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 +++++++++++++++++++++++++++++++
 7 files changed, 277 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c

-- 
1.9.1

Reply via email to