On 4/29/2026 5:43 PM, Steven Rostedt wrote:
> From: Steven Rostedt <[email protected]>
>
> [
> This is an RFC that adds a system call for dynamic linkers to use to
> tell the kernel where the sframe sections are when it loads dynamic
> libraries.
>
> It is built on top of Jens's sframe implementation for v3:
>
>
> https://lore.kernel.org/linux-trace-kernel/[email protected]/
>
> I have a repo with that code that this applies on top of here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> sframe/core
>
>
> The name of the system call is "stacktrace_setup", but I'm not attached
> to this name. If anyone can think of a better name I'm happy to take
> suggestions.
>
> This patch is just to get the conversation going and the final result
> may be much different. I tested this with the attached program which is a
> major hack. I built glibc with sframe v3 support and I used readelf to
> find the sframe size and location of glibc.
>
> readelf -e /work/usr/lib/libc.so.6 | grep sframe
> [19] .sframe GNU_SFRAME 00000000001d3fc0 001d3fc0
>
> Then I wrote a program that takes the above location and size of the
> .sframe section in libc as parameters, scans /proc/self/maps to find
> where it loaded libc and then calls this new system call with a pointer
> to the location of the sframe along with its size, as well as where the
> libc text is located.
>
> It then spins for 2 seconds, calls the system call again to remove the
> sframe section it loaded, and spins for another 2 seconds.
>
> I ran perf record --call-graph fp,defer on the program and looked for
> the do_spin() function.
>
> With sframe loaded:
>
> sframe-test 1350 1396.333593: 202366 cpu/cycles/P:
> 7fdf0ec38a44 [unknown] ([vdso])
> 5621a6b97243 get_time+0x19 (/work/c/sframe-test)
> 5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
> 5621a6b975cd main+0xd4 (/work/c/sframe-test)
> 7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
> 7fdf0ea26d05 __libc_start_main@@GLIBC_2.34+0x85
> (/work/usr/lib/libc.so.6)
> 5621a6b97131 _start+0x21 (/work/c/sframe-test)
>
> After it unloads the sframe:
>
> sframe-test 1350 1400.332902: 657582 cpu/cycles/P:
> 7fdf0ec38a5e [unknown] ([vdso])
> 5621a6b97243 get_time+0x19 (/work/c/sframe-test)
> 5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
> 5621a6b97602 main+0x109 (/work/c/sframe-test)
> 7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
>
> As you can see, with the sframe loaded, it was able to walk further up
> the libc library.
>
> Again, this is just an RFC, but I would like to get agreement on the
> system call so that we can then update the dynamic linker to do this
> instead of using my hack ;-)
> ]
>
> Add a system call that can be used by dynamic linkers to tell the kernel
> where the sframe section is in memory for libraries it loads.
>
> The system call stacktrace_setup takes 5 parameters:
>
> op - the type of operation to perform
> addr_start - The virtual address of the sframe section
> addr_length - The length of the sframe section
> text_start - the text section the sframe represents
> test_length - the length of the section
>
> The current op values are:
>
> STACKTRACE_REGISTER_SFRAME - This registers the sframe
> STACKTRACE_UNREGISTER_SFRAME - This removes the sframe
>
> Signed-off-by: Steven Rostedt <[email protected]>
LGTM. Some comments/questions below.
> diff --git a/include/uapi/linux/stacktrace.h b/include/uapi/linux/stacktrace.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_STACKTRACE_H
> +#define _UAPI_LINUX_STACKTRACE_H
> +
> +enum stacktrace_setup_types {
> + STACKTRACE_REGISTER_SFRAME = 1,
> + STACKTRACE_UNREGISTER_SFRAME = 2,
> +};
> +
> +#endif /* _UAPI_LINUX_STACKTRACE_H */
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
Having the syscall live in kernel/unwind/sframe.c means it is only
available if config option HAVE_UNWIND_USER_SFRAME is selected (which
triggers sframe.o to be built and linked into the kernel), which makes
sense as long as it only implements sframe-specific functionality.
I suppose it could be moved elsewhere if non-sframe use cases would
arise in the future?
Would Dylan need to guard it when introducing HAVE_UNWIND_KERNEL_SFRAME?
Provided the syscall fails with -ENOSYS if not implemented (e.g. when
HAVE_UNWIND_USER_SFRAME is not enabled) the dummy implementations of
sframe_add_section() and sframe_remove_section() in linux/sframe.h also
return -ENOSYS, so the user observable behavior would be the same and
it would not matter. Do you agree?
> @@ -12,8 +12,10 @@
> #include <linux/mm.h>
> #include <linux/string_helpers.h>
> #include <linux/sframe.h>
> +#include <linux/syscalls.h>
> #include <asm/unwind_user_sframe.h>
> #include <linux/unwind_user_types.h>
> +#include <uapi/linux/stacktrace.h>
>
> #include "sframe.h"
> #include "sframe_debug.h"
> @@ -838,3 +840,38 @@ void sframe_free_mm(struct mm_struct *mm)
>
> mtree_destroy(&mm->sframe_mt);
> }
> +
> +/**
> + * sys_stacktrace_setup - register an address for user space stacktrace
> walking.
> + * @op: Type of operation to perform
> + * @addr_start: The virtual address of the stacktrace information
> + * @addr_length: The length of the stacktrace information
> + * @text_start: The virtual address of the text that @addr_start represents
> + * @text_length: The length of teh text
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Currently only sframes are supported, but in the future, this may be used
> + * to tell the kernel about JIT code which will most likely have a different
> + * format.
> + *
> + * The type command may be extended and parameters may be used for other
> + * purposes.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, addr_start,
> + unsigned long, addr_length, unsigned long, text_start,
> + unsigned long, text_length)
Would it make sense to keep the parameters generic from start, similar
to how it is done in prctl()? Or can this be changed later, if the need
arises?
SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, arg2,
unsigned long, arg3, unsigned long, arg4, unsigned long, arg5)
> +{
> + switch (op) {
> + case STACKTRACE_REGISTER_SFRAME:
> + return sframe_add_section(addr_start, addr_start + addr_length,
> + text_start, text_start+text_length);
Nit:
text_start, text_start + text_length);
> + case STACKTRACE_UNREGISTER_SFRAME:
> + return sframe_remove_section(addr_start);
> + }
> + return -EINVAL;
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
[email protected] / [email protected]
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats:
Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft:
Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/