On Tue, 26 Sep 2017 09:23:20 +0800 zhouchengming <zhouchengmi...@huawei.com> wrote:
> On 2017/9/26 3:40, Steven Rostedt wrote: > > On Mon, 11 Sep 2017 14:51:49 +0800 > > Zhou Chengming<zhouchengmi...@huawei.com> wrote: > > > >> push_rt_task() pick the first pushable task and find an eligible > >> lowest_rq, then double_lock_balance(rq, lowest_rq). So if > >> double_lock_balance() unlock the rq (when double_lock_balance() return 1), > >> we have to check if this task is still on the rq. > >> > >> The problem is that the check conditions are not sufficient: > >> > >> if (unlikely(task_rq(task) != rq || > >> !cpumask_test_cpu(lowest_rq->cpu,&task->cpus_allowed) || > >> task_running(rq, task) || > >> !rt_task(task) || > >> !task_on_rq_queued(task))) { > >> > >> cpu2 cpu1 cpu0 > >> push_rt_task(rq1) > >> pick task_A on rq1 > >> find rq0 > >> double_lock_balance(rq1, rq0) > >> unlock(rq1) > >> rq1 __schedule > >> pick task_A run > >> task_A sleep (dequeued) > >> lock(rq0) > >> lock(rq1) > >> do_above_check(task_A) > >> task_rq(task_A) == rq1 > >> cpus_allowed unchanged > >> task_running == false > >> rt_task(task_A) == true > >> try_to_wake_up(task_A) > >> select_cpu = cpu3 > >> enqueue(rq3, task_A) > > How can this happen? The try_to_wake_up(task_A) needs to grab the rq > > that task A is on, and we have that rq lock. > > > > /me confused. > > > > -- Steve > > Thanks for the reply! > After the task_A sleep on cpu1, the try_to_wake_up(task_A) on cpu0 select a > different cpu3, > so it will grab the rq3 lock, not the rq1 lock. Ah crap. This is caused by 7608dec2ce20 ("sched: Drop the rq argument to sched_class::select_task_rq()"). Because this code depends on try_to_wake_up() grabbing the task's rq lock. But it no longer does that, and it causes this race. OK, I need to look at this deeper when I'm not so jetlagged and typing this because I can't sleep at 5am. Thanks for pointing this out! It may be fixed by simply grabbing the run queue lock on migration, as that would sync things up. Peter? -- Steve