Recently while revising RCU's cpu online checks, there was some discussion
around how IPIs synchronize with hotplug.

Add comments explaining how preemption disable creates mutual exclusion with
CPU hotplug's stop_machine mechanism. The key insight is that stop_machine()
atomically updates CPU masks and flushes IPIs with interrupts disabled, and
cannot proceed while any CPU (including the IPI sender) has preemption
disabled.

Cc: Andrea Righi <ari...@nvidia.com>
Cc: Paul E. McKenney <paul...@kernel.org>
Cc: Frederic Weisbecker <frede...@kernel.org>
Cc: r...@vger.kernel.org
Co-developed-by: Frederic Weisbecker <frede...@kernel.org>
Signed-off-by: Joel Fernandes <joelagn...@nvidia.com>
---
v1->v2: Reworded a bit more (minor nit).

 kernel/cpu.c |  4 ++++
 kernel/smp.c | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a59e009e0be4..a8ce1395dd2c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1310,6 +1310,10 @@ static int takedown_cpu(unsigned int cpu)
 
        /*
         * So now all preempt/rcu users must observe !cpu_active().
+        *
+        * stop_machine() waits for all CPUs to enable preemption. This lets
+        * take_cpu_down() atomically update CPU masks and flush last IPI
+        * before new IPIs can be attempted to be sent.
         */
        err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
        if (err) {
diff --git a/kernel/smp.c b/kernel/smp.c
index 974f3a3962e8..842691467f9e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -93,6 +93,9 @@ int smpcfd_dying_cpu(unsigned int cpu)
         * explicitly (without waiting for the IPIs to arrive), to
         * ensure that the outgoing CPU doesn't go offline with work
         * still pending.
+        *
+        * This runs in stop_machine's atomic context with interrupts disabled,
+        * thus CPU offlining and IPI flushing happen together atomically.
         */
        __flush_smp_call_function_queue(false);
        irq_work_run();
@@ -418,6 +421,10 @@ void __smp_call_single_queue(int cpu, struct llist_node 
*node)
  */
 static int generic_exec_single(int cpu, call_single_data_t *csd)
 {
+       /*
+        * Preemption must be disabled by caller to mutually exclude with
+        * stop_machine() atomically updating CPU masks and flushing IPIs.
+        */
        if (cpu == smp_processor_id()) {
                smp_call_func_t func = csd->func;
                void *info = csd->info;
@@ -640,6 +647,11 @@ int smp_call_function_single(int cpu, smp_call_func_t 
func, void *info,
        /*
         * prevent preemption and reschedule on another processor,
         * as well as CPU removal
+        *
+        * get_cpu() disables preemption, ensuring mutual exclusion with
+        * stop_machine() where CPU offlining and last IPI flushing happen
+        * atomically versus this code. This guarantees here that the 
cpu_online()
+        * check and IPI sending are safe without losing IPIs due to offlining.
         */
        this_cpu = get_cpu();
 
-- 
2.43.0


Reply via email to