Commit:     fa490cfd15d7ce0900097cc4e60cfd7a76381138
Parent:     a0f98a1cb7d27c656de450ba56efd31bdc59065e
Author:     Linus Torvalds <[EMAIL PROTECTED]>
AuthorDate: Mon Jun 18 09:34:40 2007 -0700
Committer:  Linus Torvalds <[EMAIL PROTECTED]>
CommitDate: Mon Jun 18 11:52:55 2007 -0700

    Fix possible runqueue lock starvation in wait_task_inactive()
    Miklos Szeredi reported very long pauses (several seconds, sometimes
    more) on his T60 (with a Core2Duo) which he managed to track down to
    wait_task_inactive()'s open-coded busy-loop.
    He observed that an interrupt on one core tries to acquire the
    runqueue-lock but does not succeed in doing so for a very long time -
    while wait_task_inactive() on the other core loops waiting for the first
    core to deschedule a task (which it wont do while spinning in an
    interrupt handler).
    This rewrites wait_task_inactive() to do all its waiting optimistically
    without any locks taken at all, and then just double-check the end
    result with the proper runqueue lock held over just a very short
    section.  If there were races in the optimistic wait, of a preemption
    event scheduled the process away, we simply re-synchronize, and start
    So the code now looks like this:
                /* Unlocked, optimistic looping! */
                rq = task_rq(p);
                while (task_running(rq, p))
                /* Get the *real* values */
                rq = task_rq_lock(p, &flags);
                running = task_running(rq, p);
                array = p->array;
                task_rq_unlock(rq, &flags);
                /* Check them.. */
                if (unlikely(running)) {
                        goto repeat;
                /* Preempted away? Yield if so.. */
                if (unlikely(array)) {
                        goto repeat;
    Basically, that first "while()" loop is done entirely without any
    locking at all (and doesn't check for the case where the target process
    might have been preempted away), and so it's possibly "incorrect", but
    we don't really care.  Both the runqueue used, and the "task_running()"
    check might be the wrong tests, but they won't oops - they just mean
    that we could possibly get the wrong results due to lack of locking and
    exit the loop early in the case of a race condition.
    So once we've exited the loop, we then get the proper (and careful) rq
    lock, and check the running/runnable state _safely_.  And if it turns
    out that our quick-and-dirty and unsafe loop was wrong after all, we
    just go back and try it all again.
    (The patch also adds a lot of comments, which is the actual bulk of it
    all, to make it more obvious why we can do these things without holding
    the locks).
    Thanks to Miklos for all the testing and tracking it down.
    Tested-by: Miklos Szeredi <[EMAIL PROTECTED]>
    Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
    Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]>
 kernel/sched.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 49be34e..a747591 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1159,21 +1159,72 @@ void wait_task_inactive(struct task_struct *p)
        unsigned long flags;
        struct rq *rq;
-       int preempted;
+       struct prio_array *array;
+       int running;
+       /*
+        * We do the initial early heuristics without holding
+        * any task-queue locks at all. We'll only try to get
+        * the runqueue lock when things look like they will
+        * work out!
+        */
+       rq = task_rq(p);
+       /*
+        * If the task is actively running on another CPU
+        * still, just relax and busy-wait without holding
+        * any locks.
+        *
+        * NOTE! Since we don't hold any locks, it's not
+        * even sure that "rq" stays as the right runqueue!
+        * But we don't care, since "task_running()" will
+        * return false if the runqueue has changed and p
+        * is actually now running somewhere else!
+        */
+       while (task_running(rq, p))
+               cpu_relax();
+       /*
+        * Ok, time to look more closely! We need the rq
+        * lock now, to be *sure*. If we're wrong, we'll
+        * just go back and repeat.
+        */
        rq = task_rq_lock(p, &flags);
-       /* Must be off runqueue entirely, not preempted. */
-       if (unlikely(p->array || task_running(rq, p))) {
-               /* If it's preempted, we yield.  It could be a while. */
-               preempted = !task_running(rq, p);
-               task_rq_unlock(rq, &flags);
+       running = task_running(rq, p);
+       array = p->array;
+       task_rq_unlock(rq, &flags);
+       /*
+        * Was it really running after all now that we
+        * checked with the proper locks actually held?
+        *
+        * Oops. Go back and try again..
+        */
+       if (unlikely(running)) {
-               if (preempted)
-                       yield();
                goto repeat;
-       task_rq_unlock(rq, &flags);
+       /*
+        * It's not enough that it's not actively running,
+        * it must be off the runqueue _entirely_, and not
+        * preempted!
+        *
+        * So if it wa still runnable (but just not actively
+        * running right now), it's preempted, and we should
+        * yield - it could be a while.
+        */
+       if (unlikely(array)) {
+               yield();
+               goto repeat;
+       }
+       /*
+        * Ahh, all good. It wasn't running, and it wasn't
+        * runnable, which means that it will never become
+        * running in the future either. We're all done!
+        */
To unsubscribe from this list: send the line "unsubscribe git-commits-head" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Reply via email to