On Tue, 06 May 2008 15:10:37 +0200 Roland Scheidegger <[EMAIL PROTECTED]> wrote:
> Markus Amsler wrote: > > With the attached patch I get in wow with enabled pixel shaders with my > > rv370 10-15% better frame rates. Its mostly because the global "terrain" > > shader uses 4 indirection with this patch, without it gets optimized to > > 1. I'm not sure what exactly goes on. My best guess is the more > > indirection the more silicon gets used (better parallelizable). > > If that's the cause we should try to use as many indirection fit into > > the hardware as possible (perhaps even adding indirections). The easiest > > way would be to translate the shader multiple times with increasing > > optimization until it fits into the hardware. > > I don't think this makes sense in general. If you have more > indirections, you have more phases, and everything I know about this > hardware (which is admittedly not that much) would indicate additional > phases will have a performance cost. Though maybe due to how scheduling > works on this chip, if you have a lot of texture instructions in one > phase, the chip might not be able to hide texture fetch latencies with > doing some ALU work. Some more investigation would be needed. > > Roland Texture indirection optimization is usefull not only for speed but also because right now we are dumb about indirection. For instance we fail compiling the mplayer shader program while if we were a bit clever we can be able to fit inside the indirection limit. I am not hw engineer of that kind of chip but so far my knowledge is that texture fetch are done asynchronously from shader instruction. For instance if you do texture fetch in node 0 then the gpu will execute all instructions of node 0 while texture fetch is performed, when there is no more instructions in node 0 gpu will wait for texture fetch to end before starting executing instruction from next node. In my xdc talk i give one example of this things (i will upload slides on fdo). This give a lot of room for optimization, basicly you want to put as much instruction as possible in a node before going to the next node. r5xx is even better in this perspective with texture semaphore which allow to finely set when you want to wait for texture fetcher (here again you want to put as much instruction as you can before a semaphore). So yes we want to optimize texture indirection but your patch is wrong it works for your case but it won't work properly (likely rendering bug of gpu or gpu lockup) for others pixel shader program. Right now i don't see or can't think of an easy way to optimize such things in current framework. I have started to play with llvm to see if we could do such optimization by adding a new spetial pass to llvm for that, i am not yet convinced this can be done, at least it does not look easy, but with limited time i have i can say for sure after 5 min of llvm experience :) Cheers, Jerome Glisse ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev