On Tue, 06 May 2008 15:10:37 +0200
Roland Scheidegger <[EMAIL PROTECTED]> wrote:

> Markus Amsler wrote:
> > With the attached patch I get in wow with enabled pixel shaders with my
> > rv370 10-15% better frame rates. Its mostly because the global "terrain"
> > shader uses 4 indirection with this patch, without it gets optimized to
> > 1. I'm not sure what exactly goes on. My best guess is the more
> > indirection the more silicon gets used (better parallelizable).
> > If that's the cause we should try to use as many indirection fit into
> > the hardware as possible (perhaps even adding indirections). The easiest
> > way would be to translate the shader multiple times with increasing
> > optimization until it fits into the hardware.
> 
> I don't think this makes sense in general. If you have more
> indirections, you have more phases, and everything I know about this
> hardware (which is admittedly not that much) would indicate additional
> phases will have a performance cost. Though maybe due to how scheduling
> works on this chip, if you have a lot of texture instructions in one
> phase, the chip might not be able to hide texture fetch latencies with
> doing some ALU work. Some more investigation would be needed.
> 
> Roland

Texture indirection optimization is usefull not only for speed but also
because right now we are dumb about indirection. For instance we fail
compiling the mplayer shader program while if we were a bit clever we
can be able to fit inside the indirection limit.

I am not hw engineer of that kind of chip but so far my knowledge is that
texture fetch are done asynchronously from shader instruction. For instance
if you do texture fetch in node 0 then the gpu will execute all instructions
of node 0 while texture fetch is performed, when there is no more instructions
in node 0 gpu will wait for texture fetch to end before starting executing
instruction from next node. In my xdc talk i give one example of this things
(i will upload slides on fdo). This give a lot of room for optimization,
basicly you want to put as much instruction as possible in a node before
going to the next node. r5xx is even better in this perspective with
texture semaphore which allow to finely set when you want to wait for
texture fetcher (here again you want to put as much instruction as
you can before a semaphore).

So yes we want to optimize texture indirection but your patch is wrong
it works for your case but it won't work properly (likely rendering bug
of gpu or gpu lockup) for others pixel shader program. Right now i don't
see or can't think of an easy way to optimize such things in current framework.
I have started to play with llvm to see if we could do such optimization
by adding a new spetial pass to llvm for that, i am not yet convinced this
can be done, at least it does not look easy, but with limited time i have
i can say for sure after 5 min of llvm experience :)

Cheers,
Jerome Glisse

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to