On Wed, May 18, 2011 at 3:16 PM, Eric Anholt <[email protected]> wrote: > On Wed, 18 May 2011 11:05:39 -0400, Jerome Glisse <[email protected]> wrote: >> On Tue, May 17, 2011 at 11:22 PM, Eric Anholt <[email protected]> wrote: >> > One of the pain points of working on compiler optimizations has been >> > justifying them -- sometimes I come up with something I think is >> > useful and spend a day or two on it, but the value doesn't show up as >> > fps in the application that suggested the optimization to me. Then I >> > wonder if this transformation of the code is paying off in general, >> > and thus if I should push it. If I don't push it, I end up bringing >> > that patch out on every application I look at that it could affect, to >> > see if now I finally have justification to get it out of a private >> > branch. >> > >> > At a conference this week, we heard about how another team is are >> > using a database of (assembly) shaders, which they run through their >> > compiler and count resulting instructions for testing purposes. This >> > sounded like a fun idea, so I threw one together. Patch #1 is good in >> > general (hey, link errors, finally!), but also means that a quick hack >> > to glslparsertest makes it link a passing compile shader and therefore >> > generate assembly that gets dumped under INTEL_DEBUG=wm. Patch #2 I >> > used for automatic scraping of shaders in every application I could >> > find on my system at the time. The open-source ones I pushed to: >> > >> > http://cgit.freedesktop.org/~anholt/shader-db >> > >> > And finally, patch #3 is something I built before but couldn't really >> > justify until now. However, given that it reduced fragment shader >> > instructions 0.3% across 831 shaders (affecting 52 of them including >> > yofrankie, warsow, norsetto, and gstreamer) and didn't increase >> > instructions anywhere, I'm a lot happier now. >> > >> > Hopefully we hook up EXT_timer_query to apitrace soon so I can do more >> > targeted optimizations and need this less :) In the meantime, I hope >> > this can prove useful to others -- if you want to contribute >> > appropriately-licensed shaders to the database so we track those, or >> > if you want to make the analysis work on your hardware backend, feel >> > free. >> > >> >> I have been thinking at doing somethings slightly different. Sadly >> instruction count is not necesarily the best metric to evaluate >> optimization performed by shader compiler. Hidding texture fetch >> latency of a shader can improve performance a lot more than saving 2 >> instructions. So my idea was to do a gl app that render into >> framebuffer thousand time the same shader. The use of fbo is to avoid >> to have things like swapbuffer or a like to play a role while we are >> solely interested in shader performance. Also use an fbo as big as >> possible so fragment shader has a lot of pixel to go through and i >> believe disabling things like blending, zbuffer ... so no other part >> of the pipeline impact in anyway the shader. > > You might take a look at mesa-demos/src/perf for that. I haven't had > success using them for performance work due to the noisiness of the > results. > > More generally, imo, the problem with that plan is you have to build the > shaders yourself and justify to yourself why that shader you wrote is > representative, and you spend all your time on building the tests when > you just wanted to know if an instruction-reduction optimization did > anything. shader-db took me one evening to build and collect for all > applications I had (I've got a personal branch for all the closed-source > stuff :/ )
Shader is a bunch of input, so for each shader collected the issue is to provide proper input, texture could use dummy texture unless the shader have some dependency on the texture data (like if the texture fetched data determine the number of iteration or is use to kill a fragment, ...). Well it's all about going through know shader and building a reasonable set of input for each of them, it's time consuming but i believe it brings a lot more for testing point of view. > For actual performance testing of apps without idsoftware-style > timedemos, I'm way more excited by the potential of using apitrace with > EXT_timer_query to decide which shaders I should be analyzing, and then > I'd know afterward whether I impacted a real application by replaying > the trace. That is, assuming I didn't increase CPU costs in the > process, which is where an apitrace replay would not be representative. > > Our perspective is: if we are driving the hardware anywhere below what > is possible, that is a bug that we should fix. Analyzing the costs of > instructions, scheduling impacts, CPU overhead impacts, etc. may be out > of scope for shader-db, but does make some types of analysis quick and > easy (test all shaders you have ever seen of in a couple minutes). I agree that shader-db provide a usefull tools, i am just convinced that number of instruction in complex shader is a bad metric especialy when considering things like r6xx and newer class of hw where texture fetch and instruction can run concurently and even different bunch of instruction of same shader can run concurently allowing to allievate the // lost due to register pressure in a shader. >From my gathering the performance of a shader on r6xx and newer is about to find a sweet spot btw number of instruction, number of register, instruction scheduling. This sweet spot is not always the minimum number of instruction/register. With the idea i described my aim is to minimize the impact of CPU & scheduling, rendering thousand of tri from a vbo with destination being the biggest fbo and disabling all interfering pipeline (zbuffer, blending, ...) should produce result once averaged over run that are dependant primarily on the shader execution time. Cheers, Jerome _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
