On Wed, Jul 16, 2025 at 11:09:22AM -0400, Steven Rostedt wrote:
> On Fri, 11 Jul 2025 10:05:26 -0700
> "Paul E. McKenney" <paul...@kernel.org> wrote:
> 
> > This trace point will invoke rcu_read_unlock{,_notrace}(), which will
> > note that preemption is disabled.  If rcutree.use_softirq is set and
> > this task is blocking an expedited RCU grace period, it will directly
> > invoke the non-notrace function raise_softirq_irqoff().  Otherwise,
> > it will directly invoke the non-notrace function irq_work_queue_on().
> 
> Just to clarify some things; A function annotated by "notrace" simply
> will not have the ftrace hook to that function, but that function may
> very well have tracing triggered inside of it.
> 
> Functions with "_notrace" in its name (like preempt_disable_notrace())
> should not have any tracing instrumentation (as Mathieu stated)
> inside of it, so that it can be used in the tracing infrastructure.
> 
> raise_softirq_irqoff() has a tracepoint inside of it. If we have the
> tracing infrastructure call that, and we happen to enable that
> tracepoint, we will have:
> 
>   raise_softirq_irqoff()
>      trace_softirq_raise()
>        [..]
>          raise_softirq_irqoff()
>             trace_softirq_raise()
>                [..]
>                  Ad infinitum!
> 
> I'm not sure if that's what is being proposed or not, but I just wanted
> to make sure everyone is aware of the above.

OK, I *think* I might actually understand the problem.  Maybe.

I am sure that the usual suspects will not be shy about correcting any
misapprehensions in the following.  ;-)

My guess is that some users of real-time Linux would like to use BPF
programs while still getting decent latencies out of their systems.
(Not something I would have predicted, but then again, I was surprised
some years back to see people with a 4096-CPU system complaining about
200-microsecond latency blows from RCU.)  And the BPF guys (now CCed)
made some changes some years back to support this, perhaps most notably
replacing some uses of preempt_disable() with migrate_disable().

Except that the current __DECLARE_TRACE() macro defeats this work
for tracepoints by disabling preemption across the tracepoint call,
which might well be a BPF program.  So we need to do something to
__DECLARE_TRACE() to get the right sort of protection while still leaving
preemption enabled.

One way of attacking this problem is to use preemptible RCU.  The problem
with this is that although one could construct a trace-safe version
of rcu_read_unlock(), these would negate some optimizations that Lai
Jiangshan worked so hard to put in place.  Plus those optimizations
also simplified the code quite a bit.  Which is why I was pushing back
so hard, especially given that I did not realize that real-time systems
would be running BPF programs concurrently with real-time applications.
This meant that I was looking for a functional problem with the current
disabling of preemption, and not finding it.

So another way of dealing with this is to use SRCU-fast, which is
like SRCU, but dispenses with the smp_mb() calls and the redundant
read-side array indexing.  Plus it is easy to make _notrace variants
srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace(),
along with the requisite guards.

Re-introducing SRCU requires reverting most of e53244e2c893 ("tracepoint:
Remove SRCU protection"), and I have hacked together this and the
prerequisites mentioned in the previous paragraph.

These are passing ridiculously light testing, but probably have at
least their share of bugs.

But first, do I actually finally understand the problem?

                                                        Thanx, Paul

Reply via email to