Hi everybody,

in my opineon itīs a real hard task to reach about 10 % speedup in Q3,
since most functions, which are easy to optimize with the 3dnow or SSE
instructions donīt need more then 10% or 12% of cpu time. 

And there is another problem. Even if the results Josh posted look very
impressing (we get speedups up to 6), these are more theoretical. In the
benchmark all vertices reside in the L1 cache and so the movīs are quite
fast. In the reality most functions are only about 1.5 or 2 times faster
then the C functions because of cache misses. 

Since the SSE instructions process 4 floats at once, most specialized
transforms in xform_tmp.h will be impossible (or really hard) to code.

I believe, a better idea would be to introduce fastpaths with big loops,
which transform & clip a vertex in one function, so that we canīt get
cache misses. Thatīs the way Keith was going with the fxfastpath. I work
at a big-loop-3dnow routine, but still need some time (Iīll be out of
town next weeks). Perhaps it is possible to introduce something similiar
for other cards, perhaps for software rendering, too (Keith ??) 

In this full_setup function we transform the vertex with a 4x4 matrix, so
that the SSE instructions make sense. The cliptest could be done well with
them (with some little acrobatics), too.


- Holger



_______________________________________________
Mesa-dev maillist  -  [EMAIL PROTECTED]
http://lists.mesa3d.org/mailman/listinfo/mesa-dev

Reply via email to