On Wed, Jul 16, 2025 at 01:35:48PM -0700, Paul E. McKenney wrote: > On Wed, Jul 16, 2025 at 11:09:22AM -0400, Steven Rostedt wrote: > > On Fri, 11 Jul 2025 10:05:26 -0700 > > "Paul E. McKenney" <paul...@kernel.org> wrote: > > > > > This trace point will invoke rcu_read_unlock{,_notrace}(), which will > > > note that preemption is disabled. If rcutree.use_softirq is set and > > > this task is blocking an expedited RCU grace period, it will directly > > > invoke the non-notrace function raise_softirq_irqoff(). Otherwise, > > > it will directly invoke the non-notrace function irq_work_queue_on(). > > > > Just to clarify some things; A function annotated by "notrace" simply > > will not have the ftrace hook to that function, but that function may > > very well have tracing triggered inside of it. > > > > Functions with "_notrace" in its name (like preempt_disable_notrace()) > > should not have any tracing instrumentation (as Mathieu stated) > > inside of it, so that it can be used in the tracing infrastructure. > > > > raise_softirq_irqoff() has a tracepoint inside of it. If we have the > > tracing infrastructure call that, and we happen to enable that > > tracepoint, we will have: > > > > raise_softirq_irqoff() > > trace_softirq_raise() > > [..] > > raise_softirq_irqoff() > > trace_softirq_raise() > > [..] > > Ad infinitum! > > > > I'm not sure if that's what is being proposed or not, but I just wanted > > to make sure everyone is aware of the above. > > OK, I *think* I might actually understand the problem. Maybe. > > I am sure that the usual suspects will not be shy about correcting any > misapprehensions in the following. ;-) > > My guess is that some users of real-time Linux would like to use BPF > programs while still getting decent latencies out of their systems. > (Not something I would have predicted, but then again, I was surprised > some years back to see people with a 4096-CPU system complaining about > 200-microsecond latency blows from RCU.) And the BPF guys (now CCed) > made some changes some years back to support this, perhaps most notably > replacing some uses of preempt_disable() with migrate_disable(). > > Except that the current __DECLARE_TRACE() macro defeats this work > for tracepoints by disabling preemption across the tracepoint call, > which might well be a BPF program. So we need to do something to > __DECLARE_TRACE() to get the right sort of protection while still leaving > preemption enabled. > > One way of attacking this problem is to use preemptible RCU. The problem > with this is that although one could construct a trace-safe version > of rcu_read_unlock(), these would negate some optimizations that Lai > Jiangshan worked so hard to put in place. Plus those optimizations > also simplified the code quite a bit. Which is why I was pushing back > so hard, especially given that I did not realize that real-time systems > would be running BPF programs concurrently with real-time applications. > This meant that I was looking for a functional problem with the current > disabling of preemption, and not finding it. > > So another way of dealing with this is to use SRCU-fast, which is > like SRCU, but dispenses with the smp_mb() calls and the redundant > read-side array indexing. Plus it is easy to make _notrace variants > srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace(), > along with the requisite guards. > > Re-introducing SRCU requires reverting most of e53244e2c893 ("tracepoint: > Remove SRCU protection"), and I have hacked together this and the > prerequisites mentioned in the previous paragraph. > > These are passing ridiculously light testing, but probably have at > least their share of bugs. > > But first, do I actually finally understand the problem?
OK, they pass somewhat less ridiculously moderate testing, though I have not yet hit them over the head with the ftrace selftests. So might as well post them. Thoughts? Thanx, Paul