Keith Whitwell wrote:
>
> Q3test is a big application. The matmul routines don't even register in the
> profiling results, transform_v16 is probably only about 8% of the time spent in
> mesa, which seems to take about half of the cpu (ie quake itself uses half, mesa
> counts for the other half). No mesa function takes more than about 8%. You have
> to improve the performance of several of them to make a noticable difference, or
> come up with a change which removes a step altogther or some other system-wide
> improvement.
>
so the jump from 51.4 to 51.6 fps is not too bad..
> Now that you've done a transform_v16, benchmark it against the standard x86
> version, and the C version. It will be easier to see if you're making a
> difference this way. If you are, look at the project routines (eg, in
> src/FX/X86, and fxfasttmp.h). Holger's 3dnow versions of these were a big
> improvement on the C.
with all the data in cache a transform loop takes
49 in c
35 in asm
21 asm+simd
cycles
>
> One nice thing about simd, even if x86 simd *isn't* much faster than normal x86
> floating point is the prefetch instructions (I assume sse has one). Use of these
> should make a real difference.
>
well.. 0.2 fps
> Keith
>
> _______________________________________________
> Mesa-dev maillist - [EMAIL PROTECTED]
> http://lists.mesa3d.org/mailman/listinfo/mesa-dev
--
ralf willenbacher ([EMAIL PROTECTED])
_______________________________________________
Mesa-dev maillist - [EMAIL PROTECTED]
http://lists.mesa3d.org/mailman/listinfo/mesa-dev