On 03/10/16 13:10, Wilco Dijkstra wrote:
frsqrte s1, s0 fmul s2, s1, s1 frsqrts s2, s0, s2 fcmp s0, 0.0 fmul s1, s1, s2 fmul s2, s1, s1 fmul s1, s0, s1 frsqrts s2, s0, s2 fcsel s1, s0, s1, eq fmul s0, s1, s2
That's what I had in mind too, but around the approximation for x^-1/2 and using masks for vector cases thusly:
fcmne v3.4s, v0.4s, #0.0 frsqrte v1.4s, v0.4s fmul v2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmul v1.4s, v1.4s, v2.4s fmul v2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmul v1.4s, v1.4s, v2.4s and v1.4s, v3.4s fmul v0.4s, v1.4s, v0.4s Thanks, -- Evandro Menezes