This is the second version of fp64 precision series, including fixes as per Ilia's advice.
The first patch should be functionally equivalent to the previous version. Changes mostly focuses on code cleanup and rewording comments. The second patch fixes a case where the original patch would generate inaccurate rsq for some small normal inputs. The third one stays untouched. I ran through more tests on these two algorithms, comparing their result with CPU implementation. I have never seen more than 1ulp difference in rcp. While in rsq, there were some cases (~500ppm) with 2ulp difference. However, analysis with mpfr shows that all of those were 1ulp error on both sides. So the precision now should satisfy the requirement. The assembly uses an instruction format yet to be merged to upstream envytools assembler. I'll get that merged soon. Boyan Ding (3): gk110/ir: Add rcp f64 implementation gk110/ir: Add rsq f64 implementation gk110/ir: Use the new rcp/rsq in library src/gallium/drivers/nouveau/codegen/lib/gk110.asm | 219 ++++++++++++++++++++- .../drivers/nouveau/codegen/lib/gk110.asm.h | 127 +++++++++++- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 32 +++ .../nouveau/codegen/nv50_ir_lowering_nvc0.h | 1 + 4 files changed, 375 insertions(+), 4 deletions(-) -- 2.12.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev