The deferred-QS irq-work handler previously cleared defer_qs_pending
only when the handler ran inside an active rcu_read_lock() critical
section (rcu_preempt_depth() > 0).  Paul McKenney pointed out a common
multi-segment compound pattern where the handler fires between
segments and segment N+1's arming attempt is silently suppressed by
the rcu_read_unlock_special() pending-gate:

    rcu_read_lock();           // segment 1 starts
    // may be preempted/boosted here
    local_irq_disable();
    rcu_read_unlock();          // segment 1 ends; arms defer_qs_pending
    preempt_disable();
    local_irq_enable();         // handler MAY fire here: depth==0, but
                                // but preempt is disabled, so it cant
                                // nudge.

    rcu_read_lock();            // segment 2 starts
    preempt_enable();
    local_irq_disable();
    rcu_read_unlock();     // arming attempt suppressed incorrectly -- (1)
    local_irq_enable();

Waiting for the next __note_gp_changes() clear is too slow for the
compound case, we need the deferred QS report sooner.

Therefore, make the irq_work handler clears defer_qs_pending whenever
rcu_in_compounded_section() is true so that (1) can do the arming.

Signed-off-by: Joel Fernandes <[email protected]>
---
 kernel/rcu/tree_plugin.h | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 8637f405cb47..2da009dbe64c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -621,6 +621,17 @@ notrace void rcu_preempt_deferred_qs(struct task_struct *t)
        rcu_preempt_deferred_qs_irqrestore(t, flags);
 }
 
+/*
+ * True if the current context is inside a compounded RCU read-side
+ * section, i.e. either in an active rcu_read_lock() (depth>0) or in an
+ * outer preempt-disabled / BH-disabled scope.
+ */
+static inline bool rcu_in_compounded_section(void)
+{
+       return rcu_preempt_depth() > 0 ||
+              (preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) != 0;
+}
+
 /*
  * Minimal handler to give the scheduler a chance to re-evaluate.
  */
@@ -632,19 +643,10 @@ static void rcu_preempt_deferred_qs_handler(struct 
irq_work *iwp)
        rdp = container_of(iwp, struct rcu_data, defer_qs_iw);
 
        /*
-        * If the IRQ work handler happens to run in the middle of RCU read-side
-        * critical section, it could be ineffective in getting the scheduler's
-        * attention to report a deferred quiescent state (the whole point of 
the
-        * IRQ work). For this reason, requeue the IRQ work.
-        *
-        * Basically, we want to avoid following situation:
-        * 1. rcu_read_unlock() queues IRQ work (state -> DEFER_QS_PENDING)
-        * 2. CPU enters new rcu_read_lock()
-        * 3. IRQ work runs but cannot report QS due to rcu_preempt_depth() > 0
-        * 4. rcu_read_unlock() does not re-queue work (state still PENDING)
-        * 5. Deferred QS reporting does not happen.
+        * Clear defer_qs_pending when the handler fires inside a compounded
+        * section as we may need to rearm the irq_work.
         */
-       if (rcu_preempt_depth() > 0)
+       if (rcu_in_compounded_section())
                rcu_defer_qs_clear(rdp);
 }
 
-- 
2.34.1


Reply via email to