I'm concerned that we don't have adequate protection for the scheduler during cpu hotplug events, but I'm willing to believe I simply don't understand the mechanism well enough. We had a crash in (comparatively ancient) 2.6.16.* but I think the relevant code is basically unchanged since then.
First we introduced some cpu-intensive workloads. Then we added two cpus. System quickly crashed. The crash was in find_busiest_group(), when the kernel tried to access "this", which was NULL. If we don't find a localgroup, we won't set this, and when we try to calculate *imbalance, we'll dereference a NULL "this" and crash. As I looked over the code, though, I couldn't tell if the fault was with find_busiest_group() for not covering this case, or if the problem was that the method the hotplug code is using to reconstruct the sched_domains really doesn't protect find_busiest_group (and find_idlest_group) at all. Can anybody explain how synchronize_sched() is really syncing? It looks like a half-implemented RCU setup. I fear we really don't have any way to protect the two functions above from hotplug's desire to twiddle with the sched_domains. Do we? Rick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

