On Sat, 23 Oct 2010 12:42:05 +0100, Peter Clifton <[email protected]> wrote: > Your patch works a treat.. I knew mine was really only a band-aid which > forced a flush on the pending indiscriminately, and was glad to see the > proper fix. > > Really difficult to get your head round all this flush / invalidate > stuff. I get the idea, but in practice it is very confusing due to the > fact it is all deferred / scheduled work, and both subtly different > concepts (flush / invalidate) concepts are handled by the same action on > the GPU, and very similar code! Very easy to muddle current / pending > ring in my head, for example. > > You replied to Alexey that the patch is only a stop gap, and inter-ring > synchronisation is the real challenge. I guess that is something you'll > be forced to look at with the new Sandybridge chipset having a separate > ring for BLT operations?
Exactly. We already have the issue on i965 with the Bitstream Decoder ring which handles video separate from the render ring. Fortunately no one has fallen over the lack of synchronisation there since the API design makes interoperating GL/RENDER/Video so difficult. Even worse is that it is only with Sandybridge that we have the ability to insert semaphores onto the ring to handle inter-ring synchronisation on the GPU, otherwise we will simply have to wait on retirement when transferring ownership from one ring to another. Is it worth the additional complexity to have buffers reside on multiple rings at the same time? Possibly if we do start mixing video + GL. Anyway with the BLT split, handling synchronisation will become an issue. > I'm just looking for fps with my circuit board rendering GL code at the > moment.. that's why I'm following git HEAD stuff, to see if the drivers > can unlock some performance in the code I'm writing. I'm struggling to > profile just what the bottleneck is! Aye, profiling GPU code at the moment is a hard problem. If you do find some CPU bottlenecks, they're usually the easiest to fix. What may help is to sync every operation and see what the relative times + relative frequencies to work out the rate limiting step and then see if you can break it down further and repeat. (Even if we had a GPU callgrind, given the disconnect between what is executed on the GPU and GL, it may not be obvious how to improve the code.) uprof may help here given the annotations Robert Brag has made for mesa profiling. We're always eager to improve our code to get the most of our admittedly lack-luster GPUs. Even suggests on what tools would be useful or improvements we could make to improve profiling/development are most welcome. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/intel-gfx
