On Mon, 2007-05-07 at 19:10 +0900, Satoru Takeuchi wrote: > Hi, > > I found a bug on 2.6.21 cpu-hotplug code. > > When process A on CPU0 try to offline the CPU1 on which the process B, > realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running > without sleep or yield, both CPU0 and CPU1 get hang. It's because of > the following code on __stop_machine_run(). > > struct task_struct *__stop_machine_run(int (*fn)(void *), void *data, > unsigned int cpu) > { > ... > p = kthread_create(do_stop, &smdata, "kstopmachine"); > if (!IS_ERR(p)) { > kthread_bind(p, cpu); > wake_up_process(p); > wait_for_completion(&smdata.done); > } > ... > } > > kstopmachine is created, bound to the CPU1, and woken up here, but > this process can't start to run because reschedule doesn't occur on > CPU1. Hence CPU0 also be able to run because it's waiting completion > of CPU1's offline work.
Yes, we should probably move the set_scheduler call in stop_machine (where the thread up-prioritizes itself) to before wake_up_process(p), to avoid this happening. Others have suggested we use the freezer; I've always distrusted that code. It's much trickier than stop_machine(). I look forward to your patch! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/