On Thu, Jan 15, 2026 at 10:48:29AM -0800, Andrii Nakryiko wrote:
> On Mon, Jan 12, 2026 at 1:50 PM Jiri Olsa <[email protected]> wrote:
> >
> > Adding support to call bpf_get_stackid helper from trigger programs,
> > so far added for kprobe multi.
> >
> > Adding the --stacktrace/-g option to enable it.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> >  tools/testing/selftests/bpf/bench.c            |  4 ++++
> >  tools/testing/selftests/bpf/bench.h            |  1 +
> >  .../selftests/bpf/benchs/bench_trigger.c       |  1 +
> >  .../selftests/bpf/progs/trigger_bench.c        | 18 ++++++++++++++++++
> >  4 files changed, 24 insertions(+)
> >
> 
> This now actually becomes a stack trace benchmark :) But I don't mind,
> I think it would be good to be able to benchmark this. But I think we
> should then implement it for all different tracing programs (tp,
> raw_tp, fentry/fexit/fmod_ret) for consistency and so we can compare
> and contrast?...

fyi I updated the bench for all program types and got some stats

current fix WITHOUT stacktrace:

        usermode-count :  810.652 ± 1.036M/s
        kernel-count   :  336.645 ± 2.812M/s
        syscall-count  :   27.798 ± 0.063M/s
        fentry         :   67.677 ± 0.291M/s
        fexit          :   49.970 ± 0.214M/s
        fmodret        :   52.860 ± 0.237M/s
        rawtp          :   65.196 ± 0.224M/s
        tp             :   34.120 ± 0.042M/s
        kprobe         :   25.157 ± 0.019M/s
        kprobe-multi   :   33.223 ± 0.205M/s
        kprobe-multi-all:    4.739 ± 0.003M/s
        kretprobe      :   10.904 ± 0.020M/s
        kretprobe-multi:   15.996 ± 0.023M/s
        kretprobe-multi-all:    2.559 ± 0.092M/s

current fix WITH stacktrace:

        usermode-count :  782.529 ± 5.866M/s
        kernel-count   :  341.116 ± 2.247M/s
        syscall-count  :   27.481 ± 0.267M/s
        fentry         :    2.397 ± 0.026M/s
        fexit          :    2.472 ± 0.008M/s
        fmodret        :    2.475 ± 0.014M/s
        rawtp          :    2.593 ± 0.031M/s
        tp             :    2.641 ± 0.020M/s
        kprobe         :    3.848 ± 0.014M/s
        kprobe-multi   :    4.188 ± 0.025M/s
        kprobe-multi-all:    0.261 ± 0.026M/s
        kretprobe      :    3.782 ± 0.011M/s
        kretprobe-multi:    4.157 ± 0.023M/s
        kretprobe-multi-all:    0.177 ± 0.000M/s

with similar fix for fentry/fexit/raw_tp/tp WITH stacktrace:

        usermode-count :  792.613 ± 1.322M/s
        kernel-count   :  337.725 ± 2.422M/s
        syscall-count  :   27.363 ± 0.030M/s
        fentry         :   14.911 ± 0.083M/s
        fexit          :   13.749 ± 0.060M/s
        fmodret        :   13.987 ± 0.049M/s
        rawtp          :   13.760 ± 0.042M/s
        tp             :    7.060 ± 0.026M/s
        kprobe         :    3.920 ± 0.012M/s
        kprobe-multi   :    4.186 ± 0.030M/s
        kprobe-multi-all:    0.281 ± 0.006M/s
        kretprobe      :    3.782 ± 0.005M/s
        kretprobe-multi:    4.030 ± 0.014M/s
        kretprobe-multi-all:    0.178 ± 0.000M/s

so cutting the extra initial unwind gets some speedup ex expected

I'm getting wrong callstack for rawtp programs, so I need to find out why,
but the rest of the tracing programs fentry/fexit.. work ok

jirka

Reply via email to