On Tue, Dec 02, 2025 at 03:14:31PM -0800, Ian Rogers wrote: > On Thu, Nov 20, 2025 at 3:48 PM Namhyung Kim <[email protected]> wrote: > > > > Save samples with deferred callchains in a separate list and deliver > > them after merging the user callchains. If users don't want to merge > > they can set tool->merge_deferred_callchains to false to prevent the > > behavior. > > > > With previous result, now perf script will show the merged callchains. > > > > $ perf script > > ... > > pwd 2312 121.163435: 249113 cpu/cycles/P: > > ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms]) > > ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms]) > > ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms]) > > ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms]) > > ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms]) > > ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms]) > > ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 > > ([kernel.kallsyms]) > > 7f18fe337fa7 mprotect+0x7 > > (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) > > 7f18fe330e0f _dl_sysdep_start+0x7f > > (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) > > 7f18fe331448 _dl_start_user+0x0 > > (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) > > ... > > > > The old output can be get using --no-merge-callchain option. > > Also perf report can get the user callchain entry at the end. > > > > $ perf report --no-children --stdio -q -S __build_id_parse.isra.0 > > # symbol: __build_id_parse.isra.0 > > 8.40% pwd [kernel.kallsyms] > > | > > ---__build_id_parse.isra.0 > > perf_event_mmap > > mprotect_fixup > > do_mprotect_pkey > > __x64_sys_mprotect > > do_syscall_64 > > entry_SYSCALL_64_after_hwframe > > mprotect > > _dl_sysdep_start > > _dl_start_user > > > > Signed-off-by: Namhyung Kim <[email protected]> > > Reviewed-by: Ian Rogers <[email protected]> > > > --- > > tools/perf/Documentation/perf-script.txt | 5 ++ > > tools/perf/builtin-inject.c | 1 + > > tools/perf/builtin-report.c | 1 + > > tools/perf/builtin-script.c | 4 ++ > > tools/perf/util/callchain.c | 29 +++++++++ > > tools/perf/util/callchain.h | 3 + > > tools/perf/util/evlist.c | 1 + > > tools/perf/util/evlist.h | 2 + > > tools/perf/util/session.c | 79 +++++++++++++++++++++++- > > tools/perf/util/tool.c | 2 + > > tools/perf/util/tool.h | 1 + > > 11 files changed, 127 insertions(+), 1 deletion(-) > > > > diff --git a/tools/perf/Documentation/perf-script.txt > > b/tools/perf/Documentation/perf-script.txt > > index 28bec7e78bc858ba..03d1129606328d6d 100644 > > --- a/tools/perf/Documentation/perf-script.txt > > +++ b/tools/perf/Documentation/perf-script.txt > > @@ -527,6 +527,11 @@ include::itrace.txt[] > > The known limitations include exception handing such as > > setjmp/longjmp will have calls/returns not match. > > > > +--merge-callchains:: > > + Enable merging deferred user callchains if available. This is the > > + default behavior. If you want to see separate CALLCHAIN_DEFERRED > > + records for some reason, use --no-merge-callchains explicitly. > > + > > :GMEXAMPLECMD: script > > :GMEXAMPLESUBCMD: > > include::guest-files.txt[] > > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > > index bd9245d2dd41aa48..51d2721b6db9dccb 100644 > > --- a/tools/perf/builtin-inject.c > > +++ b/tools/perf/builtin-inject.c > > @@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv) > > inject.tool.auxtrace = perf_event__repipe_auxtrace; > > inject.tool.bpf_metadata = perf_event__repipe_op2_synth; > > inject.tool.dont_split_sample_group = true; > > + inject.tool.merge_deferred_callchains = false; > > inject.session = __perf_session__new(&data, &inject.tool, > > > > /*trace_event_repipe=*/inject.output.is_pipe, > > /*host_env=*/NULL); > > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c > > index 2bc269f5fcef8023..add6b1c2aaf04270 100644 > > --- a/tools/perf/builtin-report.c > > +++ b/tools/perf/builtin-report.c > > @@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv) > > report.tool.event_update = perf_event__process_event_update; > > report.tool.feature = process_feature_event; > > report.tool.ordering_requires_timestamps = true; > > + report.tool.merge_deferred_callchains = !dump_trace; > > > > session = perf_session__new(&data, &report.tool); > > if (IS_ERR(session)) { > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > > index 85b42205a71b3993..62e43d3c5ad731a0 100644 > > --- a/tools/perf/builtin-script.c > > +++ b/tools/perf/builtin-script.c > > @@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv) > > bool header_only = false; > > bool script_started = false; > > bool unsorted_dump = false; > > + bool merge_deferred_callchains = true; > > char *rec_script_path = NULL; > > char *rep_script_path = NULL; > > struct perf_session *session; > > @@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv) > > "Guest code can be found in hypervisor process"), > > OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr, > > "Enable LBR callgraph stitching approach"), > > + OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains, > > + "Enable merge deferred user callchains"), > > OPTS_EVSWITCH(&script.evswitch), > > OPT_END() > > }; > > @@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv) > > script.tool.throttle = process_throttle_event; > > script.tool.unthrottle = process_throttle_event; > > script.tool.ordering_requires_timestamps = true; > > + script.tool.merge_deferred_callchains = merge_deferred_callchains; > > session = perf_session__new(&data, &script.tool); > > if (IS_ERR(session)) > > return PTR_ERR(session); > > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c > > index 2884187ccbbecfdc..71dc5a070065dd2a 100644 > > --- a/tools/perf/util/callchain.c > > +++ b/tools/perf/util/callchain.c > > @@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread > > *thread, struct evsel *evsel, > > } > > return 0; > > } > > + > > +int sample__merge_deferred_callchain(struct perf_sample *sample_orig, > > nit: We use the term deferred rather than original except in this > context. I think deferred is a little more intention revealing than > original. Perhaps add a comment capturing that the original sample is > the deferred kernel sample.
Sure, will add. Thanks, Namhyung
