Keith Whitwell wrote: > Brian Paul wrote: >> src/mesa/main/context.c | 8 ++++---- >> src/mesa/shader/slang/slang_emit.c | 23 +++++++++++++++++++---- >> src/mesa/tnl/t_vb_arbprogram.c | 5 ++++- >> 3 files changed, 27 insertions(+), 9 deletions(-) >> >> New commits: >> diff-tree 64e8088667d000a70beff93e8c300ac0bd261a60 (from >> 3dfcd48469b63c601010ea43e0d5e9ea1dc5dfab) >> Author: Brian <[EMAIL PROTECTED]> >> Date: Mon Apr 16 10:36:28 2007 -0600 >> >> Use generic program limits instead of NV-specific ones to init program >> constants. >> >> Previously, this limited us to 12 temp regs for vertex programs. Many >> vertex >> shaders could exceed that. This forces us to stop using >> t_vb_arbprogram.c >> for now because of its particular register indexing scheme. Need to >> increase >> bits allocated for register indexing, etc. > > That code is utterly dead - feel free to remove it.
The demise of the sse path though is a pity. It was an order of magnitude faster than t_vb_arbprogram, and still is compared to the new code of course. Granted, nothing prevents anyone from implementing a sse backend... Out of curiousity, I did some quick profiling (single timedemo run of doom3) to see where the time is actually spent (compiled with -O1, there were lots of visual glitches due to the trouble with ftransform/position invariant programs not being invariant when using -ffast-math, but it shouldn't make a difference for that). CPU: AMD64 processors, speed 2002.84 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name app name symbol name 2847855 29.7411 r200_dri.so r200_dri.so fetch_vector4 1303158 13.6093 r200_dri.so r200_dri.so _mesa_execute_program 1138472 11.8894 r200_dri.so r200_dri.so store_vector4 903745 9.4381 r200_dri.so r200_dri.so run_vp 577698 6.0331 doom.x86 doom.x86 (no symbols) So, it maybe shouldn't come as news, but it's not the actual math which is really slow - that's only 14% above. The real killer is the fetch / store of values, which is 51% in this example (if you count run_vp too, which spends its time most likely just for another round of copying input/output values around). Roland ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev