On 5/4/2018 1:22 PM, Rohit Jain wrote: > Hi Peter, > > On 05/04/2018 02:47 AM, Peter Zijlstra wrote: >> On Wed, May 02, 2018 at 01:52:10PM -0700, Rohit Jain wrote: >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> index 5e10aae..75d1ecf 100644 >>> --- a/kernel/sched/core.c >>> +++ b/kernel/sched/core.c >>> @@ -4033,6 +4033,9 @@ int idle_cpu(int cpu) >>> return 0; >>> #endif >>> + if (vcpu_is_preempted(cpu)) >>> + return 0; >>> + >>> return 1; >>> } >> Basically OK with this, but did you consider idle_cpu() usage outside of >> select_idle_sibling()? >> >> For instance, I think got_nohz_idle_kick() isn't quite right with this >> on. Similarly for scheduler_tick(), that wants the actual idle state. > > As far as intent is concerned, yes I agree you might be right. I left > the VM running for a couple of days, didn't see anything weird however. > > We could add a check at each of those places or something to that effect > if this is an issue. Please let me know how you want to proceed.
The point is that some idle_cpu() call sites should consider preemption state and some should not, and they must be considered on a case by case basis. You could define a new accessor to abstract the difference, and call it from select_idle_sibling and anywhere else it makes sense. available_idle_cpu() { return idle_cpu() && !vcpu_is_preempted() } - Steve