[Mesa-dev] [RFC 0/3] More precise rcp and rsq for fp64 on gk110

Boyan Ding Sun, 05 Mar 2017 07:35:43 -0800

Nvidia's ISA only provides rcp and rsq of upper 32 bit of a double-
precision number, and extra steps should be taken to achieve the
required precision. This series implements more precise algorithms
using newton-raphson steps. Edge cases such as nan and denorms are
fully taken into account. More details are covered in comments on the
assembly code.


I tested my implementation with some manually-picked values which
covered every cases, and many randomly generated numbers. I didn't
see difference more than 2ulp with CPU implementation with more than
650 million random value test on each of the two algorithms.

The implementation is only available on gk110 for two reasons: it is
the only platform on which I can test, and, I think it easier to
maintain on one platform when a lot of change might still take place.
Ideally, it should be ported to all platforms after it is thought to be
stable enough.


Boyan Ding (3):
  gk110/ir: Add rcp f64 implementation
  gk110/ir: Add rcp f64 implementation
  gk110/ir: Use the new rcp/rsq f64 in library

 src/gallium/drivers/nouveau/codegen/lib/gk110.asm  | 219 ++++++++++++++++++++-
 .../drivers/nouveau/codegen/lib/gk110.asm.h        | 134 ++++++++++++-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp      |  32 +++
 .../nouveau/codegen/nv50_ir_lowering_nvc0.h        |   1 +
 4 files changed, 382 insertions(+), 4 deletions(-)

-- 
2.12.0

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC 0/3] More precise rcp and rsq for fp64 on gk110

Reply via email to