Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

Waiman Long Fri, 22 Jan 2016 05:39:08 -0800

On 01/22/2016 01:09 AM, Davidlohr Bueso wrote:

On Thu, 21 Jan 2016, Waiman Long wrote:
On 01/21/2016 04:29 AM, Ding Tianhong wrote:
I got the vmcore and found that the ifconfig is already in thewait_list of thertnl_lock for 120 second, but my process could get and release thertnl_locknormally several times in one second, so it means that my processjump thequeue and the ifconfig couldn't get the rtnl all the time, I checkthe mutex lockslow path and found that the mutex may spin on owner ignore whetherthe wait listis empty, it will cause the task in the wait list always be cut inline, so addtest for wait list in the mutex_can_spin_on_owner and avoid thisproblem.
So this has been somewhat always known, at least in theory, until now.It's the cost
of spinning without going through the wait-queue, unlike other locks.
[...]
From: Waiman Long <waiman.l...@hpe.com>
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of wokentask in wait list
Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.
And one of the reasons why we never bothered 'fixing' things was theadditionalbranching out in the slowpath (and lack of real issue, although thisone being sodamn pathological). I fear that your approach is one of thosescenarios where thecode ends up being bloated, albeit most of it is actually duplicatedand can berefactored *sigh*. So now we'd spin, then sleep, then try spinningthen sleep again...phew. Not to mention the performance implications, ie loosing thebenefits of osqover waiter spinning in scenarios that would otherwise have more osqspinners asopposed to waiter spinners, or in setups where it is actually best toblock instead
of spinning.

The patch that I sent out is just a proof of concept to make sure thatit can fix that particular case. I do plan to refactor it if I decide togo ahead with an official one. Unlike the OSQ, there can be no more thanone waiter spinner as the wakeup function is directed to only the firsttask in the wait list and the spinning won't happen until the task isfirst woken up. In the worst case scenario, there are only 2 spinnersspinning on the lock and the owner field, one from OSQ and one from thewait list. That shouldn't put too much cacheline contention traffic tothe system.


Cheers,
Longman

Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

Reply via email to