[ This version is simply a rebase of v9 on top of the v6.17-rc3. It needs to be updated to work with the latest SFrame specification. Indu said she'll be able to make those changes, but I needed to forward port the latest code.
You can test this code with the x86 and perf changes applied at: git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git unwind/sframe-test ] This is the implementation of parsing the SFrame section in an ELF file. It's a continuation of Josh's last work that can be found here: https://lore.kernel.org/all/cover.1737511963.git.jpoim...@kernel.org/ Currently the only way to get a user space stack trace from a stack walk (and not just copying large amount of user stack into the kernel ring buffer) is to use frame pointers. This has a few issues. The biggest one is that compiling frame pointers into every application and library has been shown to cause performance overhead. Another issue is that the format of the frames may not always be consistent between different compilers and some architectures (s390) has no defined format to do a reliable stack walk. The only way to perform user space profiling on these architectures is to copy the user stack into the kernel buffer. SFrames[1] is now supported in gcc binutils and soon will also be supported by LLVM. SFrames acts more like ORC, and lives in the ELF executable file as its own section. Like ORC it has two tables where the first table is sorted by instruction pointers (IP) and using the current IP and finding it's entry in the first table, it will take you to the second table which will tell you where the return address of the current function is located and then you can use that address to look it up in the first table to find the return address of that function, and so on. This performs a user space stack walk. Now because the SFrame section lives in the ELF file it needs to be faulted into memory when it is used. This means that walking the user space stack requires being in a faultable context. As profilers like perf request a stack trace in interrupt or NMI context, it cannot do the walking when it is requested. Instead it must be deferred until it is safe to fault in user space. One place this is known to be safe is when the task is about to return back to user space. This series makes the deferred unwind code implement SFrames. [1] https://sourceware.org/binutils/wiki/sframe Changes since v9: https://lore.kernel.org/linux-trace-kernel/20250717012848.927473...@kernel.org/ - Rebased on v6.17-rc3 - Update the changes to unwind/user.c to handle passing a const unwind_user_frame pointer. Josh Poimboeuf (11): unwind_user/sframe: Add support for reading .sframe headers unwind_user/sframe: Store sframe section data in per-mm maple tree x86/uaccess: Add unsafe_copy_from_user() implementation unwind_user/sframe: Add support for reading .sframe contents unwind_user/sframe: Detect .sframe sections in executables unwind_user/sframe: Wire up unwind_user to sframe unwind_user/sframe/x86: Enable sframe unwinding on x86 unwind_user/sframe: Remove .sframe section on detected corruption unwind_user/sframe: Show file name in debug output unwind_user/sframe: Add .sframe validation option unwind_user/sframe: Add prctl() interface for registering .sframe sections ---- MAINTAINERS | 1 + arch/Kconfig | 23 ++ arch/x86/Kconfig | 1 + arch/x86/include/asm/mmu.h | 2 +- arch/x86/include/asm/uaccess.h | 39 ++- fs/binfmt_elf.c | 49 +++- include/linux/mm_types.h | 3 + include/linux/sframe.h | 60 ++++ include/linux/unwind_user_types.h | 4 +- include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 6 +- kernel/fork.c | 10 + kernel/sys.c | 9 + kernel/unwind/Makefile | 3 +- kernel/unwind/sframe.c | 593 ++++++++++++++++++++++++++++++++++++++ kernel/unwind/sframe.h | 71 +++++ kernel/unwind/sframe_debug.h | 68 +++++ kernel/unwind/user.c | 41 ++- mm/init-mm.c | 2 + 19 files changed, 967 insertions(+), 19 deletions(-) create mode 100644 include/linux/sframe.h create mode 100644 kernel/unwind/sframe.c create mode 100644 kernel/unwind/sframe.h create mode 100644 kernel/unwind/sframe_debug.h