On 03/10/16 10:52, Wilco Dijkstra wrote: > Hi Evandro, > >> I have however encountered precision issues with DF, namely some benchmarks >> in the SPECfp CPU2000 suite would fail to validate. > Accuracy is not an issue, the computation is extremely accurate. The issue is > that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, > and that causes the miscompares you're seeing. So support for the zero case > should be added. > > This would be a better expansion, supporting zero, and with lower latency > than the current sequence:
Now I think of it, frsqrts returns 1.5 for the zero case, so we only need to fix up the estimated sqrt value before the final multiply. Since a FCSEL/VAND can be hidden completely behind the latency of frsqrts, both scalar and vector case could do this: frsqrte s1, s0 fmul s2, s1, s1 frsqrts s2, s0, s2 fcmp s0, 0.0 fmul s1, s1, s2 fmul s2, s1, s1 fmul s1, s0, s1 frsqrts s2, s0, s2 fcsel s1, s0, s1, eq fmul s0, s1, s2 Wilco