On 5/22/2026 12:35 AM, Steven Rostedt wrote:
> From: Steven Rostedt <[email protected]>
>
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.
Why two separate system calls? Can't that be one single stacktracectl?
Could they at least be non-sframe specific, e.g. stracktrace_register
and stracktrace_unregister, so that if one would implement e.g. unwind
user dwarf/eh_frame in the future one could pass ehframe_start and
ehframe_end in addition to sframe_start and sframe_end?
>
> Both system calls take a pointer to a new structure:
>
> struct sframe_setup {
> unsigned long sframe_start;
> unsigned long sframe_size;
> unsigned long text_start;
> unsigned long text_size;
> };
>
> and a size of the passed in structure. If the system call needs to be
> extended, then the structure could be changed and the size of that
> structure will tell the kernel that it is the new version. If the kernel
> does not recognize the structure size, it will return -EINVAL.
>
> sframe_start - The virtual address of the sframe section
> sframe_size - The length of the sframe section
> text_start - the text section the sframe represents
> test_size - the length of the section
>
> If other stack tracing functionality is added, it will require a new
> system call.
>
> The unregister only needs the sframe_start and requires all the rest of
> the fields to be 0. In the future, if more can be done, then user space
> can update the other values and check the return code to see if the kernel
> supports it.
>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
>
> Based on top of Jens patches here:
>
>
> https://lore.kernel.org/linux-trace-kernel/[email protected]/
>
> [ Note, I tested this with the same program from the RFC patch ]
>
> Changes from RFC:
> https://patch.msgid.link/[email protected]
>
> - Remove the ioctl() like system call for a unique system call for each
> functionality. Right now there's two functionalities:
> 1. register sframe section
> 2. unregister sframe sections
>
> - Added taking a lock around the mtree logic in __sframe_remove_section()
> as Sashiko mentioned that there could be races from user space
> registering and unregistering sframe sections at the same time.
Doesn't sframe_add_section() then also need likewise?
>
> - Removed [RFC] from subject as I believe this is more likely the way
> this system call will be done.
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr,
> struct lsm_ctx __user *
> asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx
> __user *ctx,
> u32 size, u32 flags);
> asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32
> flags);
> +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
>
> /*
> * Architecture-specific system calls
> diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_SFRAME_H
> +#define _UAPI_LINUX_SFRAME_H
> +
> +struct sframe_setup {
> + unsigned long sframe_start;
> + unsigned long sframe_size;
> + unsigned long text_start;
> + unsigned long text_size;
> +};
> +
> +#endif /* _UAPI_LINUX_SFRAME_H */
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> @@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
> static int __sframe_remove_section(struct mm_struct *mm,
> struct sframe_section *sec)
> {
> - if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> - dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> - return -EINVAL;
> + scoped_guard(mmap_read_lock, mm) {
Why is a read lock sufficient? Doesn't that allow multiple readers?
How does that prevent a concurrent modification of the mm->sframe_mt?
> + if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> + dbg_sec("mtree_erase failed: text=%lx\n",
> sec->text_start);
> + return -EINVAL;
> + }
Is (or why not) likewise required in sframe_add_section() for the
mtree_insert_range()?
Wasn't the reported issue that while mt_for_each() in
sframe_remove_section() there could be concurrent mtree_erase() in
__sframe_remove_section() followed by mtree_insert_range() in
sframe_add_section(), so that the mt_for_each() could get confused?
> }
>
> call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
> @@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
>
> mtree_destroy(&mm->sframe_mt);
> }
> +
> +/**
> + * sys_sframe_register - register an address for user space stacktrace
> walking.
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data,
> unsigned int, size)
> +{
> + struct sframe_setup sframe;
> +
> + if (sizeof(sframe) != size)
> + return -EINVAL;
> +
> + if (copy_from_user(&sframe, data, size))
> + return -EFAULT;
> +
> + return sframe_add_section(sframe.sframe_start,
> + sframe.sframe_start + sframe.sframe_size,
> + sframe.text_start,
> + sframe.text_start + sframe.text_size);
> +}
> +
> +/**
> + * sys_sframe_unregister - unregister an sframe address
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * The data->sframe_start is the only value that is used. The rest must
> + * be zero.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_unregister, __user struct sframe_setup *, data,
> unsigned int, size)
> +{
> + struct sframe_setup sframe;
> +
> + if (sizeof(sframe) != size)
> + return -EINVAL;
> +
> + if (copy_from_user(&sframe, data, size))
> + return -EFAULT;
> +
> + if (sframe.sframe_size || sframe.text_start || sframe.text_size)
> + return -EINVAL;
> +
> + return sframe_remove_section(sframe.sframe_start);
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
[email protected] / [email protected]
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats:
Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft:
Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/