On 6/18/07, tbp <[EMAIL PROTECTED]> wrote:
Until now, the contract was: you have to deal with (and contain) NaN and infinities. Fair enough, even if tricky that remained manageable. But if i can't expect a mere division by 0, or sqrt of 0 (quite common with FTZ/DAZ on) to give me respectively an infinite and 0 and instead get a NaN (which i can't filter, you remember?) because of the NR round, that's pure madness.
Attached patch to should fix these troubles for the cost of 2 extra clocks. The trick is to limit the result just below infinity for rsqrt, and this keeps 0.0*(inf-) -> 0.0. Uros. Index: i386.c =================================================================== --- i386.c (revision 125790) +++ i386.c (working copy) @@ -22590,7 +22590,7 @@ void ix86_emit_swdivsf (rtx res, rtx a, void ix86_emit_swsqrtsf (rtx res, rtx a, enum machine_mode mode, bool recip) { - rtx x0, e0, e1, e2, e3, three, half; + rtx x0, e0, e1, e2, e3, three, half, bignum; x0 = gen_reg_rtx (mode); e0 = gen_reg_rtx (mode); @@ -22600,15 +22600,18 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, three = CONST_DOUBLE_FROM_REAL_VALUE (dconst3, SFmode); half = CONST_DOUBLE_FROM_REAL_VALUE (dconsthalf, SFmode); + bignum = gen_lowpart (SFmode, GEN_INT (0x7f7fffff)); if (VECTOR_MODE_P (mode)) { three = ix86_build_const_vector (SFmode, true, three); half = ix86_build_const_vector (SFmode, true, half); + bignum = ix86_build_const_vector (SFmode, true, bignum); } three = force_reg (mode, three); half = force_reg (mode, half); + bignum = force_reg (mode, bignum); /* sqrt(a) = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) 1.0 / sqrt(a) = 0.5 * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)) */ @@ -22617,6 +22620,9 @@ void ix86_emit_swsqrtsf (rtx res, rtx a, emit_insn (gen_rtx_SET (VOIDmode, x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), UNSPEC_RSQRT))); + emit_insn (gen_rtx_SET (VOIDmode, x0, + gen_rtx_SMIN (mode, x0, bignum))); + /* e0 = x0 * a */ emit_insn (gen_rtx_SET (VOIDmode, e0, gen_rtx_MULT (mode, x0, a)));