On Wed, 2 Apr 2025 11:09:25 -0700 Andrii Nakryiko <[email protected]> wrote:
> It is useful to be able to access current->mm at task exit to, say, > record a bunch of VMA information right before the task exits (e.g., for > stack symbolization reasons when dealing with short-lived processes that > exit in the middle of profiling session). Currently, > trace_sched_process_exit() is triggered after exit_mm() which resets > current->mm to NULL making this tracepoint unsuitable for inspecting > and recording task's mm_struct-related data when tracing process > lifetimes. > > There is a particularly suitable place, though, right after > taskstats_exit() is called, but before we do exit_mm() and other > exit_*() resource teardowns. taskstats performs a similar kind of > accounting that some applications do with BPF, and so co-locating them > seems like a good fit. So that's where trace_sched_process_exit() is > moved with this patch. > > Also, existing trace_sched_process_exit() tracepoint is notoriously > missing `group_dead` flag that is certainly useful in practice and some > of our production applications have to work around this. So plumb > `group_dead` through while at it, to have a richer and more complete > tracepoint. > > Note that we can't use sched_process_template anymore, and so we use > TRACE_EVENT()-based tracepoint definition. But all the field names and > order, as well as assign and output logic remain intact. We just add one > extra field at the end in backwards-compatible way. > > Signed-off-by: Andrii Nakryiko <[email protected]> > --- > include/trace/events/sched.h | 28 +++++++++++++++++++++++++--- > kernel/exit.c | 2 +- > 2 files changed, 26 insertions(+), 4 deletions(-) Acked-by: Steven Rostedt (Google) <[email protected]> -- Steve
