On Sat, 4 Mar 2017, Peter Zijlstra wrote:

> The problem with returning -EAGAIN when the waiter state mismatches is
> that it becomes very hard to proof a bounded execution time on the
> operation. And seeing that this is a RT operation, this is somewhat
> important.
> 
> While in practise it will be very unlikely to ever really take more
> than one or two rounds, proving so becomes rather hard.

Oh no. Assume the following:

T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)

CPU0

T1 
  lock_pi()
  queue_me()  <- Waiter is visible

preemption

T2
  unlock_pi()
    loops with -EAGAIN forever

> Now that modifying wait_list is done while holding both hb->lock and
> wait_lock, we can avoid the scenario entirely if we acquire wait_lock
> while still holding hb-lock. Doing a hand-over, without leaving a
> hole.

> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
> ---
>  kernel/futex.c |   26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1391,16 +1391,11 @@ static int wake_futex_pi(u32 __user *uad
>       DEFINE_WAKE_Q(wake_q);
>       int ret = 0;
>  
> -     raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
>       new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
> -     if (!new_owner) {
> +     if (WARN_ON_ONCE(!new_owner)) {
>               /*
> -              * Since we held neither hb->lock nor wait_lock when coming
> -              * into this function, we could have raced with futex_lock_pi()
> -              * such that it will have removed the waiter that brought us
> -              * here.
> -              *
> -              * In this case, retry the entire operation.
> +              * Should be impossible now... but if weirdness happens,

'now...' is not very useful 6 month from NOW :)

> +              * returning -EAGAIN is safe and correct.
>                */
>               ret = -EAGAIN;
>               goto out_unlock;
> @@ -2770,15 +2765,18 @@ static int futex_unlock_pi(u32 __user *u
>               if (pi_state->owner != current)
>                       goto out_unlock;
>  
> +             get_pi_state(pi_state);
>               /*
> -              * Grab a reference on the pi_state and drop hb->lock.
> +              * Since modifying the wait_list is done while holding both
> +              * hb->lock and wait_lock, holding either is sufficient to
> +              * observe it.
>                *
> -              * The reference ensures pi_state lives, dropping the hb->lock
> -              * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to
> -              * close the races against futex_lock_pi(), but in case of
> -              * _any_ fail we'll abort and retry the whole deal.
> +              * By taking wait_lock while still holding hb->lock, we ensure
> +              * there is no point where we hold neither; and therefore
> +              * wake_futex_pi() must observe a state consistent with what we
> +              * observed.
>                */
> -             get_pi_state(pi_state);
> +             raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
>               spin_unlock(&hb->lock);

Other than that, this pretty good.

Thanks,

        tglx

Reply via email to