On 03/10/16 10:52, Wilco Dijkstra wrote:
> Hi Evandro,
>
>> I have however encountered precision issues with DF, namely some benchmarks 
>> in the SPECfp CPU2000 suite would fail to validate.
> Accuracy is not an issue, the computation is extremely accurate. The issue is 
> that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, 
> and that causes the miscompares you're seeing. So support for the zero case 
> should be added.
>
> This would be a better expansion, supporting zero, and with lower latency 
> than the current sequence:

Now I think of it, frsqrts returns 1.5 for the zero case, so we only need to 
fix up the estimated
sqrt value before the final multiply. Since a FCSEL/VAND can be hidden 
completely behind the
latency of frsqrts, both scalar and vector case could do this:

    frsqrte  s1, s0
    fmul     s2, s1, s1
    frsqrts  s2, s0, s2
    fcmp     s0, 0.0
    fmul     s1, s1, s2
    fmul     s2, s1, s1
    fmul     s1, s0, s1
    frsqrts  s2, s0, s2
    fcsel    s1, s0, s1, eq
    fmul     s0, s1, s2

Wilco



Reply via email to