Keith Whitwell wrote:
> Brian Paul wrote:
>>  src/mesa/main/context.c            |    8 ++++----
>>  src/mesa/shader/slang/slang_emit.c |   23 +++++++++++++++++++----
>>  src/mesa/tnl/t_vb_arbprogram.c     |    5 ++++-
>>  3 files changed, 27 insertions(+), 9 deletions(-)
>>
>> New commits:
>> diff-tree 64e8088667d000a70beff93e8c300ac0bd261a60 (from 
>> 3dfcd48469b63c601010ea43e0d5e9ea1dc5dfab)
>> Author: Brian <[EMAIL PROTECTED]>
>> Date:   Mon Apr 16 10:36:28 2007 -0600
>>
>>     Use generic program limits instead of NV-specific ones to init program 
>> constants.
>>     
>>     Previously, this limited us to 12 temp regs for vertex programs.  Many 
>> vertex
>>     shaders could exceed that.  This forces us to stop using 
>> t_vb_arbprogram.c
>>     for now because of its particular register indexing scheme.  Need to 
>> increase
>>     bits allocated for register indexing, etc.
> 
> That code is utterly dead - feel free to remove it.

The demise of the sse path though is a pity. It was an order of
magnitude faster than t_vb_arbprogram, and still is compared to the new
code of course. Granted, nothing prevents anyone from implementing a sse
backend...
Out of curiousity, I did some quick profiling (single timedemo run of
doom3) to see where the time is actually spent (compiled with -O1, there
were lots of visual glitches due to the trouble with ftransform/position
invariant programs not being invariant when using -ffast-math, but it
shouldn't make a difference for that).

CPU: AMD64 processors, speed 2002.84 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples  %        image name    app name         symbol name
2847855  29.7411  r200_dri.so   r200_dri.so      fetch_vector4
1303158  13.6093  r200_dri.so   r200_dri.so      _mesa_execute_program
1138472  11.8894  r200_dri.so   r200_dri.so      store_vector4
903745    9.4381  r200_dri.so   r200_dri.so      run_vp
577698    6.0331  doom.x86      doom.x86         (no symbols)
So, it maybe shouldn't come as news, but it's not the actual math which
is really slow - that's only 14% above. The real killer is the fetch /
store of values, which is 51% in this example (if you count run_vp too,
which spends its time most likely just for another round of copying
input/output values around).

Roland

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to