On Wed, Jul 16, 2025 at 01:35:48PM -0700, Paul E. McKenney wrote:
> On Wed, Jul 16, 2025 at 11:09:22AM -0400, Steven Rostedt wrote:
> > On Fri, 11 Jul 2025 10:05:26 -0700
> > "Paul E. McKenney" <paul...@kernel.org> wrote:
> > 
> > > This trace point will invoke rcu_read_unlock{,_notrace}(), which will
> > > note that preemption is disabled.  If rcutree.use_softirq is set and
> > > this task is blocking an expedited RCU grace period, it will directly
> > > invoke the non-notrace function raise_softirq_irqoff().  Otherwise,
> > > it will directly invoke the non-notrace function irq_work_queue_on().
> > 
> > Just to clarify some things; A function annotated by "notrace" simply
> > will not have the ftrace hook to that function, but that function may
> > very well have tracing triggered inside of it.
> > 
> > Functions with "_notrace" in its name (like preempt_disable_notrace())
> > should not have any tracing instrumentation (as Mathieu stated)
> > inside of it, so that it can be used in the tracing infrastructure.
> > 
> > raise_softirq_irqoff() has a tracepoint inside of it. If we have the
> > tracing infrastructure call that, and we happen to enable that
> > tracepoint, we will have:
> > 
> >   raise_softirq_irqoff()
> >      trace_softirq_raise()
> >        [..]
> >          raise_softirq_irqoff()
> >             trace_softirq_raise()
> >                [..]
> >                  Ad infinitum!
> > 
> > I'm not sure if that's what is being proposed or not, but I just wanted
> > to make sure everyone is aware of the above.
> 
> OK, I *think* I might actually understand the problem.  Maybe.
> 
> I am sure that the usual suspects will not be shy about correcting any
> misapprehensions in the following.  ;-)
> 
> My guess is that some users of real-time Linux would like to use BPF
> programs while still getting decent latencies out of their systems.
> (Not something I would have predicted, but then again, I was surprised
> some years back to see people with a 4096-CPU system complaining about
> 200-microsecond latency blows from RCU.)  And the BPF guys (now CCed)
> made some changes some years back to support this, perhaps most notably
> replacing some uses of preempt_disable() with migrate_disable().
> 
> Except that the current __DECLARE_TRACE() macro defeats this work
> for tracepoints by disabling preemption across the tracepoint call,
> which might well be a BPF program.  So we need to do something to
> __DECLARE_TRACE() to get the right sort of protection while still leaving
> preemption enabled.
> 
> One way of attacking this problem is to use preemptible RCU.  The problem
> with this is that although one could construct a trace-safe version
> of rcu_read_unlock(), these would negate some optimizations that Lai
> Jiangshan worked so hard to put in place.  Plus those optimizations
> also simplified the code quite a bit.  Which is why I was pushing back
> so hard, especially given that I did not realize that real-time systems
> would be running BPF programs concurrently with real-time applications.
> This meant that I was looking for a functional problem with the current
> disabling of preemption, and not finding it.
> 
> So another way of dealing with this is to use SRCU-fast, which is
> like SRCU, but dispenses with the smp_mb() calls and the redundant
> read-side array indexing.  Plus it is easy to make _notrace variants
> srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace(),
> along with the requisite guards.
> 
> Re-introducing SRCU requires reverting most of e53244e2c893 ("tracepoint:
> Remove SRCU protection"), and I have hacked together this and the
> prerequisites mentioned in the previous paragraph.
> 
> These are passing ridiculously light testing, but probably have at
> least their share of bugs.
> 
> But first, do I actually finally understand the problem?

OK, they pass somewhat less ridiculously moderate testing, though I have
not yet hit them over the head with the ftrace selftests.

So might as well post them.

Thoughts?

                                                        Thanx, Paul

Reply via email to