On Mon, 15 Oct 2018 11:20:32 +0200
Peter Zijlstra <[email protected]> wrote:

> > index 2e2955a..be0fc43 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1754,7 +1754,7 @@ static struct rq *find_lock_lowest_rq(struct 
> > task_struct *task, struct rq *rq)
> >                                  !task_on_rq_queued(task))) {
> >  
> >                             double_unlock_balance(rq, lowest_rq);
> > -                           lowest_rq = NULL;
> > +                           lowest_rq = RETRY_TASK;
> >                             break;
> >                     }
> >             }  
> 
> I'm confused.. should not:
> 
>               /* try again */
>               double_unlock_balance(rq, lowest_rq);
>               lowest_rq = NULL;
> 
> also return RETRY_TASK? That also is in the double_lock_balance() path
> and will this have had rq->lock() released.

I thought the same thing at first, but this is in the loop path, where
it does everything again. But now looking closer, I think there's a bug
in the original code.

We only do the check if the immediate double_lock_balance() released
the current task rq lock, but we don't take into account if it was
released earlier, which means it could have migrated and we never
noticed!

I believe the code should look like this:

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e2955a8cf8f..2c9128ce61e2 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1718,6 +1718,7 @@ static int find_lowest_rq(struct task_struct *task)
 static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 {
        struct rq *lowest_rq = NULL;
+       bool released = false;
        int tries;
        int cpu;
 
@@ -1740,7 +1741,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct 
*task, struct rq *rq)
                }
 
                /* if the prio of this runqueue changed, try again */
-               if (double_lock_balance(rq, lowest_rq)) {
+               if (double_lock_balance(rq, lowest_rq) || released) {
                        /*
                         * We had to unlock the run queue. In
                         * the mean time, task could have
@@ -1754,7 +1755,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct 
*task, struct rq *rq)
                                     !task_on_rq_queued(task))) {
 
                                double_unlock_balance(rq, lowest_rq);
-                               lowest_rq = NULL;
+                               lowest_rq = RETRY_TASK;
                                break;
                        }
                }
@@ -1764,10 +1765,15 @@ static struct rq *find_lock_lowest_rq(struct 
task_struct *task, struct rq *rq)
                        break;
 
                /* try again */
-               double_unlock_balance(rq, lowest_rq);
+               if (double_unlock_balance(rq, lowest_rq))
+                       released = true;
+
                lowest_rq = NULL;
        }
 
+       if (!lowest_rq && released)
+               lowest_rq = RETRY_TASK;
+
        return lowest_rq;
 }
 
-- Steve

Reply via email to