On Tue, 2017-03-14 at 16:06 +0100, Sebastian Andrzej Siewior wrote:
> The setup/remove_state/instance() functions in the hotplug core code are
> serialized against concurrent CPU hotplug, but unfortunately not serialized
> against themself.
> 
> As a consequence a concurrent invocation of these function results in
> corruption of the callback machinery because two instances try to invoke
> callbacks on remote cpus at the same time. This results in missing callback
> invocations and initiator threads waiting forever on the completion.
> 
> The obvious solution to replace get_cpu_online() with cpu_hotplug_begin()
> is not possible because at least one callsite calls into these functions
> from a get_online_cpu() locked region.
> 
> Extend the protection scope of the cpuhp_state_mutex from solely protecting
> the state arrays to cover the callback invocation machinery as well.
> 
> Reported-by: Bart Van Assche <[email protected]>
> Fixes: 5b7aa87e0482 ("cpu/hotplug: Implement setup/removal interface")
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Tested-by: Bart Van Assche <[email protected]>

So this regression was introduced in kernel v4.6? Anyway, thanks for the patch!

Bart.

Reply via email to