On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote: > The kernel tries to atomically unlock the futex without checking > whether there is kernel state associated to the futex. > > So if user space manipulated the user space value, this will leave > kernel internal state around associated to the owner task. > > For robustness sake, lookup first whether there are waiters on the > futex. If there are waiters, wake the top priority waiter with all the > proper sanity checks applied. > > If there are no waiters, do the atomic release. We do not have to > preserve the waiters bit in this case, because a potentially incoming > waiter is blocked on the hb->lock and will acquire the futex > atomically. We neither have to preserve the owner died bit. The caller > is the owner and it was supposed to cleanup the mess. > > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > --- > kernel/futex.c | 78 > +++++++++++++++++++-------------------------------------- > 1 file changed, 26 insertions(+), 52 deletions(-) > > Index: linux/kernel/futex.c > =================================================================== > --- linux.orig/kernel/futex.c > +++ linux/kernel/futex.c > @@ -1186,22 +1186,6 @@ static int wake_futex_pi(u32 __user *uad > return 0; > } > > -static int unlock_futex_pi(u32 __user *uaddr, u32 uval) > -{ > - u32 uninitialized_var(oldval); > - > - /* > - * There is no waiter, so we unlock the futex. The owner died > - * bit has not to be preserved here. We are the owner: > - */ > - if (cmpxchg_futex_value_locked(&oldval, uaddr, uval, 0)) > - return -EFAULT; > - if (oldval != uval) > - return -EAGAIN; > - > - return 0; > -}
I thought we used that in multiple places, but apparently not. Worth consolidating. > - > /* > * Express the locking dependencies for lockdep: > */ > @@ -2401,10 +2385,10 @@ uaddr_faulted: > */ > static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) > { > - struct futex_hash_bucket *hb; > - struct futex_q *this, *next; > + u32 uninitialized_var(curval), uval, vpid = > task_pid_vnr(current); > union futex_key key = FUTEX_KEY_INIT; > - u32 uval, vpid = task_pid_vnr(current); > + struct futex_hash_bucket *hb; > + struct futex_q *match; > int ret; > > retry: > @@ -2417,57 +2401,47 @@ retry: > return -EPERM; > > ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, > VERIFY_WRITE); > - if (unlikely(ret != 0)) > - goto out; > + if (ret) > + return ret; Looks like you're also trying to move away from a single exit point to multiple exit points. I prefer the single exit (which you've probably noticed :-), but it's a subjective thing, and so long as we are not duplicating cleanup logic, I guess it's fine either way. This change was not mentioned in the commit message though. > > hb = hash_futex(&key); > spin_lock(&hb->lock); > > /* > - * To avoid races, try to do the TID -> 0 atomic transition > - * again. If it succeeds then we can return without waking > - * anyone else up. We only try this if neither the waiters nor > - * the owner died bit are set. > - */ > - if (!(uval & ~FUTEX_TID_MASK) && > - cmpxchg_futex_value_locked(&uval, uaddr, vpid, 0)) > - goto pi_faulted; > - /* > - * Rare case: we managed to release the lock atomically, > - * no need to wake anyone else up: > - */ > - if (unlikely(uval == vpid)) > - goto out_unlock; > - > - /* > - * Ok, other tasks may need to be woken up - check waiters > - * and do the wakeup if necessary: > - */ > - plist_for_each_entry_safe(this, next, &hb->chain, list) { > - if (!match_futex (&this->key, &key)) > - continue; > - ret = wake_futex_pi(uaddr, uval, this); > + * Check waiters first. We do not trust user space values at > + * all and we at least want to know if user space fiddled > + * with the futex value instead of blindly unlocking. > + */ > + match = futex_top_waiter(hb, &key); > + if (match) { > + ret = wake_futex_pi(uaddr, uval, match); > /* > - * The atomic access to the futex value > - * generated a pagefault, so retry the > - * user-access and the wakeup: > + * The atomic access to the futex value generated a > + * pagefault, so retry the user-access and the wakeup: > */ > if (ret == -EFAULT) > goto pi_faulted; > goto out_unlock; > } > + > /* > - * No waiters - kernel unlocks the futex: > + * We have no kernel internal state, i.e. no waiters in the > + * kernel. Waiters which are about to queue themself are stuck themselves > + * on hb->lock. So we can safely ignore them. We do neither > + * preserve the WAITERS bit not the OWNER_DIED one. We are the We preserve neither the WAITERS bit nor the OWNER_DIED bit. (the above use of "do" and "not" is incorrect and could easily be misinterpreted). > + * owner. In wake_futex_pi we verify ownership by matching pi_state->owner == current, but here the only test is the TID value, which is set by userspace - which we don't trust... I'm trying to determine if it matters in this case... if there are no waiters, is the pi_state still around? If so, it does indeed matter, and we should be verifying. > */ > - ret = unlock_futex_pi(uaddr, uval); > - if (ret == -EFAULT) > + if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) > goto pi_faulted; > This refactoring seems like it would be best done as a prequel patch so as not to confuse cleanup with functional change. At least that is what you and others have beaten into me over the years ;-) > + /* > + * If uval has changed, let user space handle it. > + */ > + ret = (curval == uval) ? 0 : -EAGAIN; > + > out_unlock: > spin_unlock(&hb->lock); > put_futex_key(&key); > - > -out: > return ret; > By dropping this you won't return ret, but rather fall through into pi_faulted... which certainly isn't what you wanted. The need for better test coverage is very evident now :-) -- Darren > pi_faulted: > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/