On Thu, Jan 22, 2026 at 06:43:31PM -0500, Joel Fernandes wrote:
> On Thu, Jan 22, 2026 at 01:55:11PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes wrote:
> > > - } else if (len > rdp->qlen_last_fqs_check + qhimark) {
> > > -         /* ... or if many callbacks queued. */
> > > -         rdp->qlen_last_fqs_check = len;
> > > -         j = jiffies;
> > > -         if (j != rdp->nocb_gp_adv_time &&
> > > -             rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> >
> > This places in cur_gp_seq not the grace period for the current callback
> > (which would be unlikely to have finished), but rather the grace period
> > for the oldest callback that has not yet been marked as done.  And that
> > callback started some time ago, and thus might well have finished.
> >
> > So while this code might not have been executed in your tests, it is
> > definitely not a logical contradiction.
> >
> > Or am I missing something subtle here?
> 
> You're right that it's not a logical contradiction - I was imprecise.
> rcu_segcblist_nextgp() returns the GP for the oldest pending callback,
> which could indeed have completed.
> 
> However, the question becomes: under what scenario do we need to advance
> here? If that GP completed, rcuog should have already advanced those
> callbacks. The only way this code path can execute is if rcuog is starved
> and not running to advance them, right?

That is one way.  The other way is if the RCU grace-period gets delayed
(perhaps by vCPU preemption) between the time that it updates the
leaf rcu_node structure's ->gp_seq field and the time that it invokes
rcu_nocb_gp_cleanup().

> But as Frederic pointed out, even if rcuog is starved, advancing here
> doesn't help - rcuog must still run anyway to wake the callback thread.
> We're just duplicating work it will do when it finally gets to run.

So maybe we don't want that first patch after all?  ;-)

> The extensive testing (300K callback floods, hours of rcutorture) showing
> zero hits confirms this window is practically unreachable. I can update the
> commit message to remove the "logical contradiction" claim and focus on the
> redundancy argument instead.

That would definitely be good!

> Would that address your concern?

Your point about the rcuoc kthread needing to be awakened is a good one.
I am still concerned about flooding on busy systems, especially if the
busy component is an underlying hypervisor, but we might need a more
principled approach for that situation.

                                                        Thanx, Paul

Reply via email to