On Thu, 19 Apr 2012 08:34:01 -0600, Brian Paul <bri...@vmware.com> wrote: > On 04/18/2012 07:40 PM, Eric Anholt wrote: > > + float det = (+ m[0][0] * adj[0][0] > > + + m[0][1] * adj[1][0] > > + + m[0][2] * adj[2][0] > > + + m[0][3] * adj[3][0]); > > + > > + return adj / det; > > Would something like this be more efficient: > > vec4 inv_det4 = vec4(1.0 / det); > adj[0] *= inv_det4; > adj[1] *= inv_det4; > adj[2] *= inv_det4; > adj[3] *= inv_det4; > > return adj; > > Actually, I just tried it and it saves 3 RCP instructions (in TGSI).
For hardware without native DIVs, yeah. For hardware with DIVs, no. The right fix IMO is to write CSE so that you get the optimized path whether you have DIVs or not.
pgpGtxuWemzW9.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev