Keith Whitwell wrote:

> 
> Q3test is a big application.  The matmul routines don't even register in the
> profiling results, transform_v16 is probably only about 8% of the time spent in
> mesa, which seems to take about half of the cpu (ie quake itself uses half, mesa
> counts for the other half).  No mesa function takes more than about 8%.  You have
> to improve the performance of several of them to make a noticable difference, or
> come up with a change which removes a step altogther or some other system-wide
> improvement.
> 
so the jump from 51.4 to 51.6 fps is not too bad..

> Now that you've done a transform_v16, benchmark it against the standard x86
> version, and the C version.  It will be easier to see if you're making a
> difference this way.  If you are, look at the project routines (eg, in
> src/FX/X86, and fxfasttmp.h).  Holger's 3dnow versions of these were a big
> improvement on the C.

with all the data in cache a transform loop takes
49 in c
35 in asm
21 asm+simd
cycles

> 
> One nice thing about simd, even if x86 simd *isn't* much faster than normal x86
> floating point is the prefetch instructions (I assume sse has one).  Use of these
> should make a real difference.
> 
well.. 0.2 fps

> Keith
> 
> _______________________________________________
> Mesa-dev maillist  -  [EMAIL PROTECTED]
> http://lists.mesa3d.org/mailman/listinfo/mesa-dev

-- 
ralf willenbacher ([EMAIL PROTECTED])


_______________________________________________
Mesa-dev maillist  -  [EMAIL PROTECTED]
http://lists.mesa3d.org/mailman/listinfo/mesa-dev

Reply via email to