Re: Batched ww-mutexes, wound-wait vs wait-die etc.

Thomas Hellstrom Mon, 16 Apr 2018 01:24:07 -0700

On 04/14/2018 10:33 AM, Daniel Vetter wrote:

Hi Thomas,


On Fri, Apr 13, 2018 at 10:23 PM, Thomas Hellstrom
<thellst...@vmware.com> wrote:

On 04/13/2018 07:13 PM, Daniel Vetter wrote:

On Wed, Apr 11, 2018 at 10:27:06AM +0200, Thomas Hellstrom wrote:

2) Should we add a *real* wound-wait choice to our wound-wait mutexes.
Otherwise perhaps rename them or document that they're actually doing
wait-die.

I think a doc patch would be good at least. Including all the data you
assembled here.


Actually, a further investigation appears to indicate that manipulating the
lock state under a local spinlock is about fast as using atomic operations
even for the completely uncontended cases.

This means that we could have a solution where you decide on a per-mutex or
per-reservation object basis whether you want to manipulate lock-state under
a "batch group" spinlock, meaning certain performance characteristics or
traditional local locking, meaning other performance characteristics.

Like, vmwgfx could choose batching locks, radeon traditional locks, but the
same API would work for both and locks could be shared between drivers..

Don't we need to make this decision at least on a per-class level?

No, I was thinking more in the line of the ww_mutex having a pointer tothe spinlock. It could either be the local mutex "wait_lock", or aper-batch-group lock. The mutex code wouldn't care. We do need a specialAPI for batched locking, though, but not for ordinary locking.

Both APIs should be able to handle local or grouped spinlocks.

Note that this would of course require that there would be noperformance loss for users that don't use batch groups.

I guess the most efficient use for GPU command submission would be touse per-process batch-groups. Then when the batch encounters a ww_mutexwith a different batch group (for example the display server sharedsurface, it'll just switch batch lock), and this way the contention for

the batch spinlock will be mostly eliminated.

  Or
how will the spinlock/batch-lock approach interact with the normal
ww_mutex_lock path (which does require the atomics/ordered stores
we're trying to avoid)?


We can use the same code with some extra
if (use_ww_ctx) in the common locking and unlocking path.

Note that the "use_ww_ctx" parameter is defined at compile-time so theordinary mutex path (binary) shouldn't change at all after optimizationbut I need to verify that, of course.

What you can't do with such a change is to lock / unlock a ww_mutexusing the standard mutex API, like mutex_lock(&ww_mutex->base), but Iguess that would be OK?


If we can't mix them I'm kinda leaning towards a
ww_batch_mutex/ww_batch_acquire_ctx, but exactly matching api
otherwise. We probably do need the new batch_start/end api, since
ww_acquire_done isn't quite the right place ...

I'll see if I get time to put together an RFC.

Yeah I think there's definitely some use for batched ww locks, where
parallelism is generally low, or at least the ratio between "time
spent acquiring locks" and "time spent doing stuff while holding
locks" small enough to not make the reduced parallelism while
acquiring an issue.

Yes. At least it's worth bringing up for discussion. The reducedparallelism shouldn't be an issue if per-process batch-groups are used,or, like for vmwgfx the command submission itself is serialized, due toa single FIFO.


/Thomas

-Daniel



_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Batched ww-mutexes, wound-wait vs wait-die etc.

Reply via email to