On Mon, Jul 07, 2025 at 10:19:52AM -0400, Joel Fernandes wrote:

> From: Joel Fernandes <joelagn...@nvidia.com>
> Subject: [PATCH] smp: Document preemption and stop_machine() mutual exclusion
> 
> Recently while revising RCU's cpu online checks, there was some discussion
> around how IPIs synchronize with hotplug.
> 
> Add comments explaining how preemption disable creates mutual exclusion with
> CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
> atomically updates CPU masks and flushes IPIs with interrupts disabled, and
> cannot proceed while any CPU (including the IPI sender) has preemption
> disabled.
> 
> Cc: Andrea Righi <ari...@nvidia.com>
> Cc: Paul E. McKenney <paul...@kernel.org>
> Cc: Frederic Weisbecker <frede...@kernel.org>
> Cc: r...@vger.kernel.org
> Acked-by: Paul E. McKenney <paul...@kernel.org>
> Co-developed-by: Frederic Weisbecker <frede...@kernel.org>
> Signed-off-by: Joel Fernandes <joelagn...@nvidia.com>
> ---
> I am leaving in Paul's Ack but Paul please let me know if there is a concern!
> 
>  kernel/smp.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 974f3a3962e8..957959031063 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -93,6 +93,9 @@ int smpcfd_dying_cpu(unsigned int cpu)
>        * explicitly (without waiting for the IPIs to arrive), to
>        * ensure that the outgoing CPU doesn't go offline with work
>        * still pending.
> +      *
> +      * This runs with interrupts disabled inside the stopper task invoked
> +      * by stop_machine(), ensuring CPU offlining and IPI flushing are 
> atomic.

So below you use 'mutual exclusion', which I prefer over 'atomic' as
used here.

>        */
>       __flush_smp_call_function_queue(false);
>       irq_work_run();
> @@ -418,6 +421,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
> *node)
>   */
>  static int generic_exec_single(int cpu, call_single_data_t *csd)
>  {
> +     /*
> +      * Preemption already disabled here so stopper cannot run on this CPU,
> +      * ensuring mutual exclusion with CPU offlining and last IPI flush.
> +      */
>       if (cpu == smp_processor_id()) {
>               smp_call_func_t func = csd->func;
>               void *info = csd->info;
> @@ -638,8 +645,10 @@ int smp_call_function_single(int cpu, smp_call_func_t 
> func, void *info,
>       int err;
>  
>       /*
> -      * prevent preemption and reschedule on another processor,
> -      * as well as CPU removal
> +      * Prevent preemption and reschedule on another processor, as well as
> +      * CPU removal.

>          Also preempt_disable() prevents stopper from running on
> +      * this CPU, thus providing atomicity between the cpu_online() check
> +      * and IPI sending ensuring IPI is not missed by CPU going offline.

That first sentence already covers this, no? 'prevents preemption' ->
stopper task cannot run, 'CPU removal' -> no CPU_DYING (because no
stopper). Also that 'atomicy' vs 'mutual exclusion' thing.


>        */
>       this_cpu = get_cpu();
>  
> -- 
> 2.34.1
> 

Reply via email to