On 05/18/2012 03:54 PM, Roland Scheidegger wrote:
Looks ok though I wonder if we really need our own assembly here? In particular if the compiler decides to use sse we really shouldn't use the fp stack for converting floats to ints. fistp is just twice as slow as sse conversion on newer cpus, and additionally it might potentially involve moving values from xmm regs to fp. I suspect something like lroundf() would generate better code than the manual assembly (and far better than the c code) if things are compiled to use sse2 at least (the same is of course true for the other functions like ceil etc.). But I guess that's not available everywhere...
For now, I'm just trying to fix the issue at hand. If anyone wants to look into using lroundf() and SSE code, that's great. I'm really not up on what's the fastest solution on various CPUs.
-Brian _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
