On Sat, 4 Mar 2017, Peter Zijlstra wrote: > The problem with returning -EAGAIN when the waiter state mismatches is > that it becomes very hard to proof a bounded execution time on the > operation. And seeing that this is a RT operation, this is somewhat > important. > > While in practise it will be very unlikely to ever really take more > than one or two rounds, proving so becomes rather hard.
Oh no. Assume the following: T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1) CPU0 T1 lock_pi() queue_me() <- Waiter is visible preemption T2 unlock_pi() loops with -EAGAIN forever > Now that modifying wait_list is done while holding both hb->lock and > wait_lock, we can avoid the scenario entirely if we acquire wait_lock > while still holding hb-lock. Doing a hand-over, without leaving a > hole. > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > --- > kernel/futex.c | 26 ++++++++++++-------------- > 1 file changed, 12 insertions(+), 14 deletions(-) > > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -1391,16 +1391,11 @@ static int wake_futex_pi(u32 __user *uad > DEFINE_WAKE_Q(wake_q); > int ret = 0; > > - raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); > new_owner = rt_mutex_next_owner(&pi_state->pi_mutex); > - if (!new_owner) { > + if (WARN_ON_ONCE(!new_owner)) { > /* > - * Since we held neither hb->lock nor wait_lock when coming > - * into this function, we could have raced with futex_lock_pi() > - * such that it will have removed the waiter that brought us > - * here. > - * > - * In this case, retry the entire operation. > + * Should be impossible now... but if weirdness happens, 'now...' is not very useful 6 month from NOW :) > + * returning -EAGAIN is safe and correct. > */ > ret = -EAGAIN; > goto out_unlock; > @@ -2770,15 +2765,18 @@ static int futex_unlock_pi(u32 __user *u > if (pi_state->owner != current) > goto out_unlock; > > + get_pi_state(pi_state); > /* > - * Grab a reference on the pi_state and drop hb->lock. > + * Since modifying the wait_list is done while holding both > + * hb->lock and wait_lock, holding either is sufficient to > + * observe it. > * > - * The reference ensures pi_state lives, dropping the hb->lock > - * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to > - * close the races against futex_lock_pi(), but in case of > - * _any_ fail we'll abort and retry the whole deal. > + * By taking wait_lock while still holding hb->lock, we ensure > + * there is no point where we hold neither; and therefore > + * wake_futex_pi() must observe a state consistent with what we > + * observed. > */ > - get_pi_state(pi_state); > + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); > spin_unlock(&hb->lock); Other than that, this pretty good. Thanks, tglx