On Wed, 21 May 2025 08:26:05 +0900 Masami Hiramatsu (Google) <mhira...@kernel.org> wrote:
> > Maybe I asked this before but I don't remember if I got the answer. :) > > How does it handle task exits as it won't go to userspace? I guess it'll > > lose user callstacks for exit syscalls and other termination paths. I just checked, and the good news is that task_work does indeed get called when a task exits. The bad news is that it happens after do_exit() cleans up the task's "mm" structure via exit_mm(). Which means that current->mm is NULL :-p There's a proposal to move trace_sched_process_exit() to before exit_mm(). If that happens, we could make that tracepoint a "faultable" tracepoint and then the unwind infrastructure could attach to it and do the unwinding from that tracepoint. > > > > Similarly, it will miss user callstacks in the samples at the end of > > profiling if the target tasks remain in the kernel (or they sleep). > > It looks like a fundamental limitation of the deferred callchains. Yes that is a limitation. > > Can we use a hybrid approach for this case? > It might be more balanced (from the performance point of view) to save > the full stack in a classic way only in this case, rather than faulting > on process exit or doing file access just to load the sframe. Another approach is that the tool (like perf) could request to take the user space stack trace every time a task enters the kernel via a system call. -- Steve