On Mon, Jan 12, 2026 at 5:53 AM Steven Rostedt <[email protected]> wrote:
>
> On Sun, 11 Jan 2026 15:38:38 -0800
> Alexei Starovoitov <[email protected]> wrote:
>
> > > Oh, so you are OK replacing the preempt_disable in the tracepoint
> > > callbacks with fast SRCU?
> >
> > yes, but..
> >
> > > Then I guess we can simply do that. Would it be fine to do that for
> > > both RT and non-RT? That will simplify the code quite a bit.
> >
> > Agree. perf needs preempt_disable in their callbacks (as this patch does)
> > and bpf side needs to add migrate_disable in __bpf_trace_run for now.
> > Though syscall tracepoints are sleepable we don't take advantage of
> > that on the bpf side. Eventually we will, and then rcu_lock
> > inside __bpf_trace_run will become srcu_fast_lock.
> >
> > The way to think about generic infrastructure like tracepoints is
> > to minimize their overhead no matter what out-of-tree and in-tree
> > users' assumptions are today, so why do we need preempt_disable
> > or srcu_fast there?
>
> Either preempt disable or srcu_fast is still needed.
>
> > I think today it's there because all callbacks (perf, ftrace, bpf)
> > expect preemption to be disabled, but can we just remove it from tp side?
> > and move preempt_disable to callbacks that actually need it?
>
> Yes if we are talking about switching from preempt_disable to src_fast.
> No if you mean to remove both as it still needs RCU protection. It's
> used for synchronizing changes in the tracepoint infrastructure itself.
> That __DO_TRACE_CALL() macro is the guts of the tracepoint callback
> code. It needs to handle races with additions and removals of callbacks
> there, as the callbacks also get data passed to them. If it gets out of
> sync, a callback could be called with another callback's data.
>
> That's why it has:
>
>                 it_func_ptr =                                           \
>                         rcu_dereference_raw((&__tracepoint_##name)->funcs);
>
> >
> > I'm looking at release_probes(). It's fine as-is, no?
>
> That's just freeing, and as you see, there's RCU synchronization
> required.
>
> Updates to tracepoints require RCU protection. It started out with
> preempt_disable() for all tracepoints (which was synchronized with
> synchronized_sched() which later became just synchronize_rcu()).

I see.

> The issue that came about was that both ftrace and perf had an
> assumption that its tracepoint callbacks always have preempt disabled
> when being called. To move to srcu_fast() that is no longer the case.
> And we need to do that for PREEMPT_RT if BPF is running very long
> callbacks to the tracepoints.
>
> Ftrace has been fixed to not require it, but still needs to take into
> account if tracepoints disable preemption or not so that it can display
> the preempt_count() of when the tracepoint was called correctly.
>
> Perf is trivial to fix as it can simply add a preempt_disable() in its
> handler.
>
> But we were originally told that BPF had an issue because it had the
> assumption that tracepoint callbacks wouldn't migrate. That's where
> this patch came about.
>
> Now if you are saying that BPF will handle migrate_disable() on its own
> and not require the tracepoint infrastructure to do it for it, then
> this is perfect. And I can then simplify this code, and just use
> srcu_fast for both RT and !RT.

Agree. Just add migrate_disable to __bpf_trace_run,
or, better yet, use rcu_read_lock_dont_migrate() in there.

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index fe28d86f7c35..abbf0177ad20 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2062,7 +2062,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link
*link, u64 *args)
        struct bpf_run_ctx *old_run_ctx;
        struct bpf_trace_run_ctx run_ctx;

-       cant_sleep();
+       rcu_read_lock_dont_migrate();
        if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
                bpf_prog_inc_misses_counter(prog);
                goto out;
@@ -2071,13 +2071,12 @@ void __bpf_trace_run(struct bpf_raw_tp_link
*link, u64 *args)
        run_ctx.bpf_cookie = link->cookie;
        old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);

-       rcu_read_lock();
        (void) bpf_prog_run(prog, args);
-       rcu_read_unlock();

        bpf_reset_run_ctx(old_run_ctx);
 out:
        this_cpu_dec(*(prog->active));
+       rcu_read_unlock_migrate();
 }

Reply via email to