Quoting Kenneth Graunke (2017-08-05 02:10:43)
> On Friday, August 4, 2017 12:22:19 PM PDT Chris Wilson wrote:
> > Quoting Kenneth Graunke (2017-08-04 19:47:14)
> > > On Friday, July 21, 2017 8:36:42 AM PDT Chris Wilson wrote:
> > > > Patch reordering from last time so that the cosmetic tweaks are done 
> > > > first
> > > > and out of the way. Kenneth has reviewed the core NO_RELOC patches, so
> > > > hopefully it doesn't look too bad and we can land at least as far as
> > > > there (patch 8/10).
> > > > 
> > > > Thanks,
> > > > -Chris
> > > 
> > > I split up some patches and pushed a modified version of this series.
> > > 
> > > To ssh://git.freedesktop.org/git/mesa/mesa
> > >    5c007203b73..6c530ad1160  master -> master
> > > 
> > > Thanks a ton for getting us to NO_RELOC.  I really like the new reloc
> > > flags system as well.  It's so much nicer!
> > 
> > I've still got to win you over to using LUT indices (kernel side, there
> > shouldn't be any case where it is worse, but the differences are easily
> > dwarfed in typical cases where it is only about 10% faster, but any
> > reduction inside the struct_mutex is a must), I see, and the per-context
> > bo along with removing the auxiliary render_cache set...
> > 
> > But now for something completely different...
> > -Chris
> I landed I915_EXEC_HANDLE_LUT too, actually.
> I definitely want per-context BOs, but Jason and I had come up with some
> patches for that as well, and I haven't had a chance to compare your
> approach with ours to see which is better.  I hope to do that soon.

No worries, just happy to keep the ball rolling! Once the dust settles,
we need to compare notes on what we think the next direction should be.
brw_emit_reloc remains one of the chief costs in batch building, but it
is pretty much down to cache misses and sheer frequency of use.
soft-pinning is only easily applicable for full-ppgtt, and 48b to reduce
the likelihood of address exhaustion. But at least that will allow us to
assign global addresses and skip relocations, with a trivial fallback to
the existing brw_emit_reloc.

On the other side of the coin, execbuf is pretty much down to the cost
of processing buffers (doing the lookup from handle to vma, locking and
fence tracking). Yet still takes a disproportionate amount of time. We
can change the interface to reduce the amount of data being passed
through the uABI each time (using predefined sets of handles, and using
command rings rather than copying), but that I feel only address a small
portion of the cost: implicit GEM tracking of everything is expensive.
Yet that is not easy to turn off...

Ideas most welcome!
mesa-dev mailing list

Reply via email to