some thoughts about tex env optimization on r200

Roland Scheidegger Wed, 03 Aug 2005 08:20:38 -0700

Ian Romanick wrote (in the GL_ARB_texture_env_crossbar on r200 thread):
> Other optimizations are possible, but I never explored them.  Most of
>  the ones that I could think of are probably unlikely in practice.
> Doing things like replacing {T1 + T2}, {P + P}, {P + T3} with {T1 +
> T2}*2, {P + T3}, or replacing {T1 * T2}, {P + T0} with {T1 * T2 + T0}
> are possible, but probably not worth the effort.

Thought about it if it would be really worthwile to optimize for shorter"shader programs", and figured I'd really need some performance figuresup-front (there's no point implementing some super-advanced optimizer ifit turns out the shader won't run faster anyway...)


So, I hacked the driver a bit and gathered some numbers.

What I did was mostly always enable (and emit) a fixed number of thepixel shader stages (e.g. change the R200_TEX_BLEND_x_ENABLE bits).Quake III performance, with lightmaps (so most things are dual-textured,some remain single-textured), hyperz, compressed textures, 1024x768x32,demo four:(enable 4 tex env for instance means pix shader stages 0-3 are alwaysenabled, with 2 and more the rendering output is always correct, with 1light maps are missing and everything is too bright, with 0 there are acouple more errors)

normal code: 149 fps
enable 0 tex env: 150 fps
enable 1 tex env: 150 fps
enable 2 tex env: 146 fps
enable 3 tex env:  122 fps
enable 4 tex env:  97 fps
enable 5 tex env:  79 fps
enable 6 tex env:  66 fps

And, to see if it makes a difference WHAT stages are enabled, only texblend stage 5 enabled:

texenv 5 only: 67 fps

The same with vertex lighting (so everything is single-textured):
normal code: 227 fps
0 tex env: 227 fps
1 tex env: 227 fps
2 tex env: 211 fps
3 tex env: 167 fps
4 tex env: 127 fps
5 tex env: 103 fps
6 tex env: 87 fps

The numbers were interesting but not quite conclusive enough, so acouple more with the Mesa multiarb demo (textures always enabledin-order in the demo):

multiarb 2 textures:
0 tex env: 230 fps
2 tex env: 227 fps
3 tex env: 210 fps
4 tex env: 191 fps
6 tex env: 162 fps

multiarb 3 textures:
0 tex env: 209 fps
2 tex env: 210 fps
3 tex env: 200 fps
4 tex env: 191 fps
6 tex env: 162 fps

multiarb 4 textures:
0 tex env: 191 fps
2 tex env: 191 fps
3 tex env: 191 fps
4 tex env: 187 fps
6 tex env: 162 fps

and finally some tests to see if it makes a difference what texturesampling stages (as opposed to the blending stages) are enabled, using amodified multiarb (with GL_REPLACE, and the same texture for the 1st and4th texture, the "hacked" result means this used the 4th texture mappingunit, but the driver was hacked to use the 1st blending stage instead ofthe 4th).

multiarb (*) 1st tex:
1 tex env: 255 fps
4 tex env: 191 fps
normal: 257 fps

multiarb (*) 4th tex:
1 tex env: 255 fps
4 tex env: 191 fps
normal: 191 fps
hacked: 257 fps

So, conclusions: long shader programs indeed can have a (sometimesdrastic - see the quake3 results) performance impact. However, it lookslike you get about as many instructions basically for free as you usetexturing units. Since with standard GL you have as many textureblending stages as you use texture mapping units, optimizing doesn'treally seem to be worthwile (unless you actually don't need some texturelookups for the final result and could disable texture sampling for thatunit). And, the results MAY be different for a r200 (instead of therv250 I used), since it can sample 2 textures per clock as opposed to 1(though if I'm not mistaken it is restricted to bilinear otherwise itneeds 2 cycles whereas rv250 can do trilinear in one cycle), but has thesame arithmetic throughput than the rv250, meaning it might not hide thearithmetic instructions so well.An optimization which would however have benefits would be to always usepix shader stages in-order - the time the chip needs to perform thecalculations does not seem to depend on the number of stages enabled atall, but only the highest stage enabled (which is clearly not the casefor texture sampling, there it doesn't seem to matter which units areused - again with r200 the results may be different, the 2 texturingunits seem to be somewhat pair-wise arranged). I am not too sure thoughsuch code is common (I believe most apps usually use the texturing unitsin-order).

On a somewhat unrelated note, I was surprised to see a much largerperformance difference in quake3 than multiarb (as multiarb doesbasically nothing but texturing, but quake3 also uses z-buffer etc.).


Roland


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

some thoughts about tex env optimization on r200

Reply via email to