Nvidia's ISA only provides rcp and rsq of upper 32 bit of a double- precision number, and extra steps should be taken to achieve the required precision. This series implements more precise algorithms using newton-raphson steps. Edge cases such as nan and denorms are fully taken into account. More details are covered in comments on the assembly code.
I tested my implementation with some manually-picked values which covered every cases, and many randomly generated numbers. I didn't see difference more than 2ulp with CPU implementation with more than 650 million random value test on each of the two algorithms. The implementation is only available on gk110 for two reasons: it is the only platform on which I can test, and, I think it easier to maintain on one platform when a lot of change might still take place. Ideally, it should be ported to all platforms after it is thought to be stable enough. Boyan Ding (3): gk110/ir: Add rcp f64 implementation gk110/ir: Add rcp f64 implementation gk110/ir: Use the new rcp/rsq f64 in library src/gallium/drivers/nouveau/codegen/lib/gk110.asm | 219 ++++++++++++++++++++- .../drivers/nouveau/codegen/lib/gk110.asm.h | 134 ++++++++++++- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 32 +++ .../nouveau/codegen/nv50_ir_lowering_nvc0.h | 1 + 4 files changed, 382 insertions(+), 4 deletions(-) -- 2.12.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev