On Fri, Jun 15, 2018 at 4:44 PM, Eric Anholt <e...@anholt.net> wrote:
> Michel Dänzer <mic...@daenzer.net> writes: > > > On 2018-06-15 05:25 PM, Jason Ekstrand wrote: > >> On June 15, 2018 01:14:24 Michel Dänzer <mic...@daenzer.net> wrote: > >>> On 2018-06-15 07:31 AM, Jason Ekstrand wrote: > >>>> > >>>> I did some testing and x11perf -copywinwin500 is... exactly the same > >>>> with > >>>> or without my patches. If anything they might improve it by just a > >>>> hair. > >>> > >>> Possible explanations I can think of: > >>> > >>> 1. Your glamor still has its own FBO cache. Which version of xserver > are > >>> you testing with? > >>> > >> 1.19 I think > > > > Okay, that doesn't have the glamor FBO cache anymore. > > > > > >>> 2. The i965 driver cache isn't hit even before these changes. > >> > >> It's definitely getting hit in both cases, it just may require a > >> slightly larger cache of we aren't recycling BOs until they're idle. > > > > It might be more than just slightly, -copywinwin500 can queue many > > overlapping copies between flushes. Can you compare the maximum total > > cache size with and without this series? > > I suspect it'll be only about a factor of > how-many-batchbuffers-before-throttling difference -- while the > batchbuffer still references the BO, the bufmgr wouldn't see the buffer > to reuse it anyway. I suspect we hit the aperture limit and flush in > the copywinwin500 case. > At Ken's suggestion, I ran some statistics for hits/misses. I did three runs each with master and with my branch: Master: hits = 455868, misses = 388, max_bucket_size = 160 hits = 404358, misses = 113, max_bucket_size = 34 hits = 497731, misses = 363, max_bucket_size = 148 With patches: hits = 493634 misses = 253, max_bucket_size = 85 hits = 495667, misses = 237, max_bucket_size = 83 hits = 454738, misses = 358, max_bucket_size = 132 Some of the numbers, as you can see, are rather noisy but the end result is about the same: we get at least 1000x as many cache hits as misses when running that test. I don't think the choice to recycle busy BOs is really gaining us anything whatsoever. It is worth noting that I did both of those runs in debug builds because I had to use gdb to get the data back out of the driver (prints inside the GL driver used by glamor don't work too well). That probably affected things a bit but I doubt the end result would have been that much different. Which begs the question, why does Michel see such a big difference on radeon? Is there something else that's causing the slow-down? Is recomputing surface layouts expensive? Is there more VMA shuffling that's causing problems?
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev