Kendall,
The percentage improvement you quote is largely due to the introduction
of the CVA extension, and its use in quake3. I am told that Mesa's
performance with Q3 on 3dfx hardware is comparable to what you can
expect from 3dfx's drivers on windows. That aside, there is certainly
still room for improvment on our code.
Performance is relative. Mesa I am certain remains slow for
applications which require lighting, fog, texgen, eval, and other
'high-level' operations. If you want to discuss performance, you have
to narrow things down a little. If you want to get something done, I
suggest you narrow it down all the way to the name of a function, and
then suggest an improvement. Best of all, submit a patch to this list.
But, regarding your comments on the pipeline, if you look at the newest
CVS code, you'll notice the pipeline is now a data structure, and is
built and modified at runtime.
If you look at the code on the 'experimental-1' branch, you'll see in
dd.h some functions like:
GLuint (*RegisterPipelineStages)( struct gl_pipeline_stage *out,
const struct gl_pipeline_stage *in,
GLuint nr );
/* Register new pipeline stages, or modify existing ones. See also
* the OptimizePipeline() functions.
*/
void (*OptimizePrecalcPipeline)( GLcontext *ctx, struct gl_pipeline
*pipe );
void (*OptimizeImmediatePipeline)( GLcontext *ctx, struct gl_pipeline
*pipe);
/* Check to see if a fast path exists for this combination of
* stages in the precalc and immediate (elt) pipelines. This
* allows the introduction of fast, combined pipeline stages which
* do not populate the VB data structures in the way required for
* inclusion as a general purpose pipeline stage.
*/
These allow drivers to provide code performing exactly the type of
operations you describe, and in the case of the 'Optimize' routines, to
break out of the 'expected' data rules, including typing and placement
of the data. Thus it is possible to code up maximally efficient
fast-paths for any particular rendering scenario that takes your fancy,
and apply them at runtime.
These functions have been discussed before on this list, particularly in
context of hardware tranformation and lighting.
Anyway, before you engage in more hand-waving, let me point out the
existence of a tool for measuring where time is spent in a running
program - the profiler. Here is profiling output for quake3 running the
current 'experimental-1' code:
% cumulative self
time seconds seconds name
12.03 4.96 4.96 fx_tri_view_clip_RGBA_TMU0
9.84 9.02 4.06 fx_tri_view_clip_RGBA_TMU0_TMU1
7.30 12.03 3.01 fxsetupXYZWRGBAT0
6.38 14.66 2.63 gl_x86_cliptest_points4
6.01 17.14 2.48 gl_3dnow_transform_points3_general_raw
5.53 19.42 2.28 fxsetupXYZWRGBAT0T1
3.23 20.75 1.33
render_vb_triangles_fx_smooth_indirect_view_clipped
3.10 22.03 1.28 gl_BindTexture
2.64 23.12 1.09 gl_prepare_arrays_cva
2.42 24.12 1.00 fxsetupXYWRGBAT0
1.84 24.88 0.76 gl_update_state
1.70 25.58 0.70 cliptest_points3_raw
1.45 26.18 0.60 fxsetupRGBAT0
Waving my own hands, I would say fxsetup routines are about 70% real
work, 30% copying. The setup routines consume a disproportionate amount
of time because they are still 'C' coded, and will probably remain that
way because Glide3 will be here RSN. The excellent x86 and 3dnow vector
processing routines have essentially no fat on the bone.
It is clear that we do spend quite a bit of time in fx setup. We also
spend quite a bit of time in the clipping code. These two are
intimately related - the interaction between clipping (in clip space)
and setup (clip and window -> device space) is the biggest 'knot' left
in the pipeline, and is what I am currently addressing.
Keith
_______________________________________________
Mesa-dev maillist - [EMAIL PROTECTED]
http://lists.mesa3d.org/mailman/listinfo/mesa-dev