On Wed, May 27, 2020 at 07:12:36PM +0200, Peter Zijlstra wrote:
> Subject: rcu: Allow for smp_call_function() running callbacks from idle
> 
> Current RCU hard relies on smp_call_function() callbacks running from
> interrupt context. A pending optimization is going to break that, it
> will allow idle CPUs to run the callbacks from the idle loop. This
> avoids raising the IPI on the requesting CPU and avoids handling an
> exception on the receiving CPU.
> 
> Change rcu_is_cpu_rrupt_from_idle() to also accept task context,
> provided it is the idle task.
> 
> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
> ---
>  kernel/rcu/tree.c   | 25 +++++++++++++++++++------
>  kernel/sched/idle.c |  4 ++++
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d8e9dbbefcfa..c716eadc7617 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -418,16 +418,23 @@ void rcu_momentary_dyntick_idle(void)
>  EXPORT_SYMBOL_GPL(rcu_momentary_dyntick_idle);
>  
>  /**
> - * rcu_is_cpu_rrupt_from_idle - see if interrupted from idle
> + * rcu_is_cpu_rrupt_from_idle - see if 'interrupted' from idle
>   *
>   * If the current CPU is idle and running at a first-level (not nested)
> - * interrupt from idle, return true.  The caller must have at least
> - * disabled preemption.
> + * interrupt, or directly, from idle, return true.
> + *
> + * The caller must have at least disabled IRQs.
>   */
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> -     /* Called only from within the scheduling-clock interrupt */
> -     lockdep_assert_in_irq();
> +     long nesting;
> +
> +     /*
> +      * Usually called from the tick; but also used from smp_function_call()
> +      * for expedited grace periods. This latter can result in running from
> +      * the idle task, instead of an actual IPI.
> +      */
> +     lockdep_assert_irqs_disabled();
>  
>       /* Check for counter underflows */
>       RCU_LOCKDEP_WARN(__this_cpu_read(rcu_data.dynticks_nesting) < 0,
> @@ -436,9 +443,15 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>                        "RCU dynticks_nmi_nesting counter underflow/zero!");
>  
>       /* Are we at first interrupt nesting level? */
> -     if (__this_cpu_read(rcu_data.dynticks_nmi_nesting) != 1)
> +     nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
> +     if (nesting > 1)
>               return false;
>  
> +     /*
> +      * If we're not in an interrupt, we must be in the idle task!
> +      */
> +     WARN_ON_ONCE(!nesting && !is_idle_task(current));
> +
>       /* Does CPU appear to be idle from an RCU standpoint? */
>       return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
>  }

Let me revive this thread after yesterdays IRC conversation.

As said; it might be _extremely_ unlikely, but somewhat possible for us
to send the IPI concurrent with hot-unplug, not yet observing
rcutree_offline_cpu() or thereabout.

Then have the IPI 'delayed' enough to not happen until smpcfd_dying()
and getting ran there.

This would then run the function from the stopper thread instead of the
idle thread and trigger the warning, even though we're not holding
rcu_read_lock() (which, IIRC, was the only constraint).

So would something like the below be acceptable?

---
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 368749008ae8..2c8d4c3e341e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -445,7 +445,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
        /*
         * Usually called from the tick; but also used from smp_function_call()
         * for expedited grace periods. This latter can result in running from
-        * the idle task, instead of an actual IPI.
+        * a (usually the idle) task, instead of an actual IPI.
         */
        lockdep_assert_irqs_disabled();
 
@@ -461,9 +461,14 @@ static int rcu_is_cpu_rrupt_from_idle(void)
                return false;
 
        /*
-        * If we're not in an interrupt, we must be in the idle task!
+        * If we're not in an interrupt, we must be in task context.
+        *
+        * This will typically be the idle task through:
+        *   flush_smp_call_function_from_idle(),
+        *
+        * but can also be in CPU HotPlug through smpcfd_dying().
         */
-       WARN_ON_ONCE(!nesting && !is_idle_task(current));
+       WARN_ON_ONCE(!nesting && !in_task(current));
 
        /* Does CPU appear to be idle from an RCU standpoint? */
        return __this_cpu_read(rcu_data.dynticks_nesting) == 0;

Reply via email to