On Wed, Sep 05, 2012 at 11:48:40PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-09-05 at 14:39 -0700, Paul E. McKenney wrote:
> > RCU callback execution can add significant OS jitter and also can degrade
> > scheduling latency.  This commit therefore adds the ability for selected
> > CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded to
> > kthreads.  If the "rcu_nocb_poll" boot parameter is also specified, these
> > kthreads will do polling, removing the need for the offloaded CPUs to do
> > wakeups.  At least one CPU must be doing normal callback processing:
> > currently CPU 0 cannot be selected as a no-CBs CPU.  In addition, attempts
> > to offline the last normal-CBs CPU will fail.
> > 
> > This is an experimental patch, so just FYI for the moment.  Known
> > shortcomings include:
> > 
> > o       The counters should be atomic_long_t rather than atomic_t.
> > 
> > o       No-CBs CPUs can be configured only at boot time.
> > 
> > o       Only a modest number of CPUs can be configured as no-CBs CPUs.
> >         Definitely a few tens, perhaps a few hundred, but no way thousands.
> > 
> > o       At least one CPU must remain a normal-CBs CPU.
> > 
> > o       Not much in the way of energy-efficiency features, though there
> >         are some natural energy savings inherent in the implementation
> >         
> > o       The per-no-CBs-CPU kthreads are not subject to RCU priority 
> > boosting.
> > 
> > o       Care is required when setting the kthreads to RT priority.
> > 
> > Later versions will address some of them, but others are likely to remain. 
> 
> My LPC feedback in writing...
> 
> So I see RCU as consisting of two parts:
>   A) Grace period tracking,
>   2) Running the callbacks.
> 
> This series seems to conflate the two, it talks of doing the callbacks
> elsewhere (kthread), but it also moves the grace period detectoring into
> the same kthread.
> 
> The latter part is what complicates the thing. I'd suggest doing the
> very simple callbacks only implementation first and leaving the grace
> period machinery in the tick.
> 
> Its typically the callbacks that consume most CPU time, whereas the
> grace period computations, while tricky and subtle, are relatively
> cheap.
> 
> In particular, it solves the need to wait for grace periods from the
> kthread (and bounce that no-nocb cpu to make progress), and it makes the
> atomic list operations stuff a lot easier.

I was excited by this possibility when you first mentioned it, but
the low-OS-jitter fans are going to need the grace-period computation
to be offloaded as well.  So if I use your (admittedly much simpler)
approach, I get to rewrite it when Frederic's adaptive-ticks work goes
in.  Given that this is probably happening relatively soon, it would be
better if I just did the implementation that will be needed long-term,
rather than rewriting.

Though I am sure that people will be sad about fewer RCU patches.  ;-)

                                                        Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to