Bad hotplug/scheduler interaction?

Rick Lindsley Thu, 13 Sep 2007 00:53:07 -0700

I'm concerned that we don't have adequate protection for the scheduler
during cpu hotplug events, but I'm willing to believe I simply don't
understand the mechanism well enough.  We had a crash in (comparatively
ancient) 2.6.16.* but I think the relevant code is basically unchanged
since then.


First we introduced some cpu-intensive workloads.  Then we added two cpus.
System quickly crashed.  The crash was in find_busiest_group(), when
the kernel tried to access "this", which was NULL.  If we don't find a
localgroup, we won't set this, and when we try to calculate *imbalance,
we'll dereference a NULL "this" and crash.

As I looked over the code, though, I couldn't tell if the fault was with
find_busiest_group() for not covering this case, or if the problem was
that the method the hotplug code is using to reconstruct the sched_domains
really doesn't protect find_busiest_group (and find_idlest_group) at all.
Can anybody explain how synchronize_sched() is really syncing?  It looks
like a half-implemented RCU setup.  I fear we really don't have any way
to protect the two functions above from hotplug's desire to twiddle
with the sched_domains.

Do we?

Rick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Bad hotplug/scheduler interaction?

Reply via email to