On Sun, May 23, 2010 at 12:44:54PM +0200, Mathias Fröhlich wrote: > > Hi, > > On Monday 17 May 2010 20:51:09 Corbin Simpson wrote: > > I'm going to be proactive here, and pull in both this patch and a docs > > update. > Ok. > Now that the infrastructure is there. > > My initial aim was to have something to profile the r300g driver. > It already runs nicer then the classic one for plenty of stuff I tried. But > that is still up to factors slower than the binary only driver from amd/ati. > So having a clue where to look for improvements is a good thing to do. > Also OpenSceneGraph uses this timers for its helpful graphical scene graph > profiling aids. > > To do that I have been looking into the docs to find a cycle counter or > something equivalent in the gpu. But so far without luck. > If we have such a counter, we could dump that counter into the query object > similar to the occlusion query implementation. > > Sure we can alternatively trigger a soft interrupt and read the kernel timer > in the interrupt handler. That would already give nanoseconds timers. But I > hope that this kind of functionality could be implemented less intrusive as > this requires changes to the kernel part of the driver I think. > > Thoughts? > Knowledge about some undocumented registers fitting that purpose? > > ... looking at amds windows profiling tools make me believe that there are > such > registers. > > Greetings > > Mathias
IMHO, i think using profiler such as sysprof can already gives clue on to where we are slower. I think we are still CPU limited(1) rather than only GPU limited thus profiling GPU won't give any significant improvement. Also adding support for hyper-z+fast clear is likely to give significant improvement (around 20-50% iirc the ATI figures). Last pageflipping will also improve the fps for anything fullscreen. (1) I did some microbenchmark few month ago and sending the same GPU rendering command stream 10000 was around 4 times faster than rendering the same scene through GL 10000 times (note that what get to GPU is the same in both case). Of course this is microbenchmark so result must be taken with care. Also it's possible that the memory manager is taking bad decision and waste memory bandwidth. I haven't yet think to a way to benchmark memory manager (i guess only way is to test different memory manager scheme). Anyway my point is that GPU profiling is likely of limited interest given the missing features. Cheers, Jerome _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev