On 7/8/2025 3:21 AM, Peter Zijlstra wrote:
> On Mon, Jul 07, 2025 at 10:19:52AM -0400, Joel Fernandes wrote:
> 
>> From: Joel Fernandes <joelagn...@nvidia.com>
>> Subject: [PATCH] smp: Document preemption and stop_machine() mutual exclusion
>>
>> Recently while revising RCU's cpu online checks, there was some discussion
>> around how IPIs synchronize with hotplug.
>>
>> Add comments explaining how preemption disable creates mutual exclusion with
>> CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
>> atomically updates CPU masks and flushes IPIs with interrupts disabled, and
>> cannot proceed while any CPU (including the IPI sender) has preemption
>> disabled.
>>
>> Cc: Andrea Righi <ari...@nvidia.com>
>> Cc: Paul E. McKenney <paul...@kernel.org>
>> Cc: Frederic Weisbecker <frede...@kernel.org>
>> Cc: r...@vger.kernel.org
>> Acked-by: Paul E. McKenney <paul...@kernel.org>
>> Co-developed-by: Frederic Weisbecker <frede...@kernel.org>
>> Signed-off-by: Joel Fernandes <joelagn...@nvidia.com>
>> ---
>> I am leaving in Paul's Ack but Paul please let me know if there is a concern!
>>
>>  kernel/smp.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index 974f3a3962e8..957959031063 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -93,6 +93,9 @@ int smpcfd_dying_cpu(unsigned int cpu)
>>       * explicitly (without waiting for the IPIs to arrive), to
>>       * ensure that the outgoing CPU doesn't go offline with work
>>       * still pending.
>> +     *
>> +     * This runs with interrupts disabled inside the stopper task invoked
>> +     * by stop_machine(), ensuring CPU offlining and IPI flushing are 
>> atomic.
> 
> So below you use 'mutual exclusion', which I prefer over 'atomic' as
> used here.

Sure, will fix.

> 
>>       */
>>      __flush_smp_call_function_queue(false);
>>      irq_work_run();
>> @@ -418,6 +421,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
>> *node)
>>   */
>>  static int generic_exec_single(int cpu, call_single_data_t *csd)
>>  {
>> +    /*
>> +     * Preemption already disabled here so stopper cannot run on this CPU,
>> +     * ensuring mutual exclusion with CPU offlining and last IPI flush.
>> +     */
>>      if (cpu == smp_processor_id()) {
>>              smp_call_func_t func = csd->func;
>>              void *info = csd->info;
>> @@ -638,8 +645,10 @@ int smp_call_function_single(int cpu, smp_call_func_t 
>> func, void *info,
>>      int err;
>>  
>>      /*
>> -     * prevent preemption and reschedule on another processor,
>> -     * as well as CPU removal
>> +     * Prevent preemption and reschedule on another processor, as well as
>> +     * CPU removal.
> 
>>          Also preempt_disable() prevents stopper from running on
>> +     * this CPU, thus providing atomicity between the cpu_online() check
>> +     * and IPI sending ensuring IPI is not missed by CPU going offline.
> 
> That first sentence already covers this, no? 'prevents preemption' ->
> stopper task cannot run, 'CPU removal' -> no CPU_DYING (because no
> stopper).

Yeah I understand that's "implied" but I'd like to specifically call that out if
that's Ok :)

> Also that 'atomicy' vs 'mutual exclusion' thing.

Sure, will fix :)

Thanks!

 - Joel


Reply via email to