On Tue, 2017-03-14 at 16:06 +0100, Sebastian Andrzej Siewior wrote: > The setup/remove_state/instance() functions in the hotplug core code are > serialized against concurrent CPU hotplug, but unfortunately not serialized > against themself. > > As a consequence a concurrent invocation of these function results in > corruption of the callback machinery because two instances try to invoke > callbacks on remote cpus at the same time. This results in missing callback > invocations and initiator threads waiting forever on the completion. > > The obvious solution to replace get_cpu_online() with cpu_hotplug_begin() > is not possible because at least one callsite calls into these functions > from a get_online_cpu() locked region. > > Extend the protection scope of the cpuhp_state_mutex from solely protecting > the state arrays to cover the callback invocation machinery as well. > > Reported-by: Bart Van Assche <[email protected]> > Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface") > Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Tested-by: Bart Van Assche <[email protected]> So this regression was introduced in kernel v4.6? Anyway, thanks for the patch! Bart.

