On 03/10/16 13:10, Wilco Dijkstra wrote:
     frsqrte  s1, s0
     fmul     s2, s1, s1
     frsqrts  s2, s0, s2
     fcmp     s0, 0.0
     fmul     s1, s1, s2
     fmul     s2, s1, s1
     fmul     s1, s0, s1
     frsqrts  s2, s0, s2
     fcsel    s1, s0, s1, eq
     fmul     s0, s1, s2

That's what I had in mind too, but around the approximation for x^-1/2 and using masks for vector cases thusly:

        fcmne   v3.4s, v0.4s, #0.0
        frsqrte v1.4s, v0.4s
        fmul    v2.4s, v1.4s, v1.4s
        frsqrts v2.4s, v0.4s, v2.4s
        fmul    v1.4s, v1.4s, v2.4s
        fmul    v2.4s, v1.4s, v1.4s
        frsqrts v2.4s, v0.4s, v2.4s
        fmul    v1.4s, v1.4s, v2.4s
        and     v1.4s, v3.4s
        fmul    v0.4s, v1.4s, v0.4s


Thanks,

--
Evandro Menezes

Reply via email to