Prior to this commit, defer_qs_pending was an unbalanced flag:
rcu_read_unlock_special() set it to PENDING whenever a deferred-QS
mechanism was scheduled, but the clear paths did not cover every
up-tree quiescent-state reporting site. In those cases the flag stays
PENDING after the QS is reported, and rcu_read_unlock_special()'s
pending-gate then silently rejects all future arming attempts.

A test patch confirms TREE03 can have get into the problematic stuck
state very quickly (< 5 minutes).

Clear the flag in __note_gp_changes(), right after the nothing-to-do
early return.  This is the natural per-CPU "GP transitioned, sync local
state" hook, called from the GP-kthread's rcu_gp_init()/rcu_gp_cleanup()
paths, and other GP advancement paths.

For dynticks-idle CPUs, they do not call __note_gp_changes(), but they
also do not arm new PENDING work (no readers running), and on wake-up,
note_gp_changes() is called before any new reader runs.

Signed-off-by: Joel Fernandes <[email protected]>
---
 kernel/rcu/tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 55df6d37145e..d0816468ffee 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1281,6 +1281,8 @@ static bool __note_gp_changes(struct rcu_node *rnp, 
struct rcu_data *rdp)
        if (rdp->gp_seq == rnp->gp_seq)
                return false; /* Nothing to do. */
 
+       rcu_defer_qs_clear(rdp);
+
        /* Handle the ends of any preceding grace periods first. */
        if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) ||
            unlikely(rdp->gpwrap)) {
-- 
2.34.1


Reply via email to