On Tue, Oct 16, 2018 at 01:41:22PM -0700, Joel Fernandes wrote:
> On Tue, Oct 16, 2018 at 04:26:11AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 15, 2018 at 02:08:56PM -0700, Paul E. McKenney wrote:
> > > On Mon, Oct 15, 2018 at 01:15:56PM -0700, Joel Fernandes wrote:
> > > > On Mon, Oct 15, 2018 at 12:54:26PM -0700, Paul E. McKenney wrote:
> > > > [...]
> > > > > > > In any case, please don't spin for milliseconds with preemption 
> > > > > > > disabled.
> > > > > > > The real-time guys are unlikely to be happy with you if you do 
> > > > > > > this!
> > > > > > 
> > > > > > Well just to clarify, I was just running Oleg's test which did 
> > > > > > this. This
> > > > > > test was mentioned in the original documentation that I deleted. 
> > > > > > Ofcourse I
> > > > > > would not dare do such a thing in production code :-D. I guess to 
> > > > > > Oleg's
> > > > > > defense, he did it to very that synchronize_rcu() was not blocked on
> > > > > > preempt-disable sections which was a different test.
> > > > > 
> > > > > Understood!  Just pointing out that RCU's tolerating a given action 
> > > > > does
> > > > > not necessarily mean that it is a good idea to take that action.  ;-)
> > > > 
> > > > Makes sense :-) thanks.
> > > 
> > > Don't worry, that won't happen again.  ;-)
> > > 
> > > > > > > > > +             pr_crit("SPIN done!\n");
> > > > > > > > > +             preempt_enable();
> > > > > > > > > +             break;
> > > > > > > > > +     case 777:
> > > > > > > > > +             pr_crit("SYNC start\n");
> > > > > > > > > +             synchronize_rcu();
> > > > > > > > > +             pr_crit("SYNC done!\n");
> > > > > > > > 
> > > > > > > > But you are using the console printing infrastructure which is 
> > > > > > > > rather
> > > > > > > > heavyweight. Try replacing pr_* calls with trace_printk so that 
> > > > > > > > you
> > > > > > > > write to the lock-free ring buffer, this will reduce the noise 
> > > > > > > > from the
> > > > > > > > heavy console printing infrastructure.
> > > > > > > 
> > > > > > > And this might be a problem as well.
> > > > > > 
> > > > > > This was not the issue (or atleast not fully the issue) since I saw 
> > > > > > the same
> > > > > > thing with trace_printk. It was exactly what you said - which is the
> > > > > > excessively long preempt disabled times.
> > > > > 
> > > > > One approach would be to apply this patch against (say) v4.18, which
> > > > > does not have consolidated grace periods.  You might then be able to
> > > > > tell if the pr_crit() calls make any difference.
> > > > 
> > > > I could do that, yeah. But since the original problem went away due to
> > > > disabling preempts for a short while, I will move on and continue to 
> > > > focus on
> > > > updating other parts of the documenation. Just to mention I
> > > > brought this up because I thought its better to do that than not to, 
> > > > just
> > > > incase there is any lurking issue with the consolidation. Sorry if that 
> > > > ended
> > > > up with me being noisy.
> > > 
> > > Not a problem, no need to apologize!
> > 
> > Besides, digging through the code did point out a reasonable optimization.
> > In the common case, this would buy 100s of microseconds rather than
> > milliseconds, but it seems simple enough to be worthwhile.  Thoughts?
> 
> Cool, thanks. One comment below:
> 
> > ------------------------------------------------------------------------
> > 
> > commit 07921e8720907f58f82b142f2027fc56d5abdbfd
> > Author: Paul E. McKenney <paul...@linux.ibm.com>
> > Date:   Tue Oct 16 04:12:58 2018 -0700
> > 
> >     rcu: Speed up expedited GPs when interrupting RCU reader
> >     
> >     In PREEMPT kernels, an expedited grace period might send an IPI to a
> >     CPU that is executing an RCU read-side critical section.  In that case,
> >     it would be nice if the rcu_read_unlock() directly interacted with the
> >     RCU core code to immediately report the quiescent state.  And this does
> >     happen in the case where the reader has been preempted.  But it would
> >     also be a nice performance optimization if immediate reporting also
> >     happened in the preemption-free case.
> >     
> >     This commit therefore adds an ->exp_hint field to the task_struct 
> > structure's
> >     ->rcu_read_unlock_special field.  The IPI handler sets this hint when
> >     it has interrupted an RCU read-side critical section, and this causes
> >     the outermost rcu_read_unlock() call to invoke 
> > rcu_read_unlock_special(),
> >     which, if preemption is enabled, reports the quiescent state 
> > immediately.
> >     If preemption is disabled, then the report is required to be deferred
> >     until preemption (or bottom halves or interrupts or whatever) is 
> > re-enabled.
> >     
> >     Because this is a hint, it does nothing for more complicated cases.  For
> >     example, if the IPI interrupts an RCU reader, but interrupts are 
> > disabled
> >     across the rcu_read_unlock(), but another rcu_read_lock() is executed
> >     before interrupts are re-enabled, the hint will already have been 
> > cleared.
> >     If you do crazy things like this, reporting will be deferred until some
> >     later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
> >     
> >     Reported-by: Joel Fernandes <j...@joelfernandes.org>
> >     Signed-off-by: Paul E. McKenney <paul...@linux.ibm.com>
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 004ca21f7e80..64ce751b5fe9 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -571,8 +571,10 @@ union rcu_special {
> >     struct {
> >             u8                      blocked;
> >             u8                      need_qs;
> > +           u8                      exp_hint; /* Hint for performance. */
> > +           u8                      pad; /* No garbage from compiler! */
> >     } b; /* Bits. */
> > -   u16 s; /* Set of bits. */
> > +   u32 s; /* Set of bits. */
> >  };
> >  
> >  enum perf_event_task_context {
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index e669ccf3751b..928fe5893a57 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
> >      */
> >     if (t->rcu_read_lock_nesting > 0) {
> >             raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > -           if (rnp->expmask & rdp->grpmask)
> > +           if (rnp->expmask & rdp->grpmask) {
> >                     rdp->deferred_qs = true;
> > +                   WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > +           }
> >             raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >     }
> >  
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 8b48bb7c224c..d6286eb6e77e 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -643,8 +643,9 @@ static void rcu_read_unlock_special(struct task_struct 
> > *t)
> >     local_irq_save(flags);
> >     irqs_were_disabled = irqs_disabled_flags(flags);
> >     if ((preempt_bh_were_disabled || irqs_were_disabled) &&
> > -       t->rcu_read_unlock_special.b.blocked) {
> > +       t->rcu_read_unlock_special.s) {
> >             /* Need to defer quiescent state until everything is enabled. */
> > +           WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
> >             raise_softirq_irqoff(RCU_SOFTIRQ);
> 
> Still going through this patch, but it seems to me like the fact that
> rcu_read_unlock_special is called means someone has requested for a grace
> period. Then in that case, does it not make sense to raise the softirq
> for processing anyway?

Not necessarily.  Another reason that rcu_read_unlock_special() might
be called is if the RCU read-side critical section had been preempted,
in which case there might not even be a grace period in progress.
In addition, if interrupts, bottom halves, and preemption are all enabled,
the code in rcu_preempt_deferred_qs_irqrestore() doesn't need to bother
raising softirq, as it can instead just immediately report the quiescent
state.

                                                        Thanx, Paul

Reply via email to