Prior to this commit, defer_qs_pending was an unbalanced flag: rcu_read_unlock_special() set it to PENDING whenever a deferred-QS mechanism was scheduled, but the clear paths did not cover every up-tree quiescent-state reporting site. In those cases the flag stays PENDING after the QS is reported, and rcu_read_unlock_special()'s pending-gate then silently rejects all future arming attempts.
A test patch confirms TREE03 can have get into the problematic stuck state very quickly (< 5 minutes). Clear the flag in __note_gp_changes(), right after the nothing-to-do early return. This is the natural per-CPU "GP transitioned, sync local state" hook, called from the GP-kthread's rcu_gp_init()/rcu_gp_cleanup() paths, and other GP advancement paths. For dynticks-idle CPUs, they do not call __note_gp_changes(), but they also do not arm new PENDING work (no readers running), and on wake-up, note_gp_changes() is called before any new reader runs. Signed-off-by: Joel Fernandes <[email protected]> --- kernel/rcu/tree.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 55df6d37145e..d0816468ffee 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1281,6 +1281,8 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp) if (rdp->gp_seq == rnp->gp_seq) return false; /* Nothing to do. */ + rcu_defer_qs_clear(rdp); + /* Handle the ends of any preceding grace periods first. */ if (rcu_seq_completed_gp(rdp->gp_seq, rnp->gp_seq) || unlikely(rdp->gpwrap)) { -- 2.34.1

