This is based on top of the deferred unwind core patch series: https://lore.kernel.org/linux-trace-kernel/20250717004910.297898...@kernel.org/ git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git unwind/core
This series implements the perf interface to use deferred user space stack tracing. The first 5 patches are clean ups and simplifications. There's a standalone series with these patches here: https://lore.kernel.org/linux-trace-kernel/20250717173125.434618...@kernel.org/ Patch 6 implements a task deferred tracing that works with events following a specific task (per thread). Patch 7 implements a per CPU deferred tracing that requires the application (perf user space) to have a per CPU event buffer for every CPU where a task may migrate to from the time a deferred request is made to when the stack trace occurs, as a task may migrate to a different CPU after the request and before it goes back to user space. The rest of the patches implement the tool side of perf. KNOWN ISSUES: - The marker that adds the USER_DEFERRED when the request was made, should also add the cookie. As the cookie can be used to figure out if dropped events missed a stack trace and not to attach a stack trace to the wrong events. - The writing of the stack trace should probably be changed to act more like get_perf_callchain() where it does fixups to uprobes. The code for this series is located here: git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git unwind/perf Head SHA1: 5753b61c16f61e50f35bf0f3dfbf8a00b8de2d51 Changes since v13: https://lore.kernel.org/linux-trace-kernel/20250708020003.565862...@kernel.org/ - Missed one location to replace the current->mm == NULL check that still only checked PF_KTHREAD. It must also check PF_USER_WORKER. - Need to copy the trace.entries[] one a at a time as the perf entry in the ring buffer has 64 bit entries, but trace.entries[] are size long. - Added back the cookie field in perf_callchain_deferred_event structure (Note, it was a timestamp before) (Namhyung Kim) - Add the cookie to the comment explaining perf_callchain_deferred_event. - Fixed deferred_unwind_request() to return 1 if the request was already queued or was already executed to not incorrectly increment nr_no_switch_fast. - Display the cookie in the -D output Josh Poimboeuf (5): perf: Remove get_perf_callchain() init_nr argument perf: Have get_perf_callchain() return NULL if crosstask and user are set perf: Simplify get_perf_callchain() user logic perf: Skip user unwind if the task is a kernel thread perf: Support deferred user callchains Namhyung Kim (4): perf tools: Minimal CALLCHAIN_DEFERRED support perf record: Enable defer_callchain for user callchains perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED perf tools: Merge deferred user callchains Steven Rostedt (2): perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL perf: Support deferred user callchains for per CPU events ---- include/linux/perf_event.h | 13 +- include/uapi/linux/perf_event.h | 20 +- kernel/bpf/stackmap.c | 8 +- kernel/events/callchain.c | 49 ++-- kernel/events/core.c | 424 +++++++++++++++++++++++++++++- tools/include/uapi/linux/perf_event.h | 19 +- tools/lib/perf/include/perf/event.h | 8 + tools/perf/Documentation/perf-script.txt | 5 + tools/perf/builtin-script.c | 92 +++++++ tools/perf/util/callchain.c | 24 ++ tools/perf/util/callchain.h | 3 + tools/perf/util/event.c | 1 + tools/perf/util/evlist.c | 1 + tools/perf/util/evlist.h | 1 + tools/perf/util/evsel.c | 39 +++ tools/perf/util/evsel.h | 1 + tools/perf/util/machine.c | 1 + tools/perf/util/perf_event_attr_fprintf.c | 1 + tools/perf/util/sample.h | 3 +- tools/perf/util/session.c | 79 ++++++ tools/perf/util/tool.c | 2 + tools/perf/util/tool.h | 4 +- 22 files changed, 762 insertions(+), 36 deletions(-)