Consider the following sequence of events in a PREEMPT=y kernel:

1.      All CPUs corresponding to a given rcu_node structure go offline.
        A new grace period starts just after the CPU-hotplug code path
        does its synchronize_rcu() for the last CPU, so at least this
        CPU is present in that structure's ->qsmask.

2.      Before the grace period ends, a CPU comes back online, and not
        just any CPU, but the one corresponding to a non-zero bit in
        the leaf rcu_node structure's ->qsmask.

3.      A task running on the newly onlined CPU is preempted while in
        an RCU read-side critical section.  Because this CPU's ->qsmask
        bit is net, not only does this task queue itself on the leaf
        rcu_node structure's ->blkd_tasks list, it also sets that
        structure's ->gp_tasks pointer to reference it.

4.      The grace period started in #1 above comes to an end.  This
        results in rcu_gp_cleanup() being invoked, which, among other
        things, checks to make sure that there are no tasks blocking the
        just-ended grace period, that is, that all ->gp_tasks pointers
        are NULL.  The ->gp_tasks pointer corresponding to the task
        preempted in #3 above is non-NULL, which results in a splat.

This splat is a false positive.  The task's RCU read-side critical
section cannot have begun before the just-ended grace period because
this would mean either: (1) The CPU came online before the grace period
started, which cannot have happened because the grace period started
before that CPU was all the way offline, or (2) The task started its
RCU read-side critical section on some other CPU, but then it would
have had to have been preempted before migrating to this CPU, which
would mean that it would have instead queued itself on that other CPU's
rcu_node structure.

This commit eliminates this false positive by adding code to the end
of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU,
which has the side-effect of clearing that CPU's ->qsmask bit, preventing
the above scenario.  This approach has the added benefit of more promptly
reporting quiescent states corresponding to offline CPUs.

Note well that the call to rcu_report_qs_rnp() reporting the quiescent
state must come -before- the clearing of this CPU's bit in the leaf
rcu_node structure's ->qsmaskinitnext field.  Otherwise, lockdep-RCU
will complain bitterly about quiescent states coming from an offline CPU.

Signed-off-by: Paul E. McKenney <[email protected]>
---
 kernel/rcu/tree.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5171c0361fbd..58ca69b5656b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3729,6 +3729,11 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct 
rcu_state *rsp)
        /* Remove outgoing CPU from mask in the leaf rcu_node structure. */
        mask = rdp->grpmask;
        raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order 
guarantee. */
+       if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
+               /* Report quiescent state -before- changing ->qsmaskinitnext! */
+               rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
+               raw_spin_lock_irqsave_rcu_node(rnp, flags);
+       }
        rnp->qsmaskinitnext &= ~mask;
        raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
-- 
2.17.1

Reply via email to