On 5/22/2026 12:35 AM, Steven Rostedt wrote:
> From: Steven Rostedt <[email protected]>
> 
> Add system calls to register and unregister sframes that can be used by
> dynamic linkers to tell the kernel where the sframe section is in memory
> for libraries it loads.

Why two separate system calls?  Can't that be one single stacktracectl?
Could they at least be non-sframe specific, e.g. stracktrace_register
and stracktrace_unregister, so that if one would implement e.g. unwind
user dwarf/eh_frame in the future one could pass ehframe_start and
ehframe_end in addition to sframe_start and sframe_end?

> 
> Both system calls take a pointer to a new structure:
> 
>   struct sframe_setup {
>       unsigned long           sframe_start;
>       unsigned long           sframe_size;
>       unsigned long           text_start;
>       unsigned long           text_size;
>   };
> 
> and a size of the passed in structure. If the system call needs to be
> extended, then the structure could be changed and the size of that
> structure will tell the kernel that it is the new version. If the kernel
> does not recognize the structure size, it will return -EINVAL.
> 
>   sframe_start - The virtual address of the sframe section
>   sframe_size  - The length of the sframe section
>   text_start   - the text section the sframe represents
>   test_size    - the length of the section
> 
> If other stack tracing functionality is added, it will require a new
> system call.
> 
> The unregister only needs the sframe_start and requires all the rest of
> the fields to be 0. In the future, if more can be done, then user space
> can update the other values and check the return code to see if the kernel
> supports it.
> 
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> 
> Based on top of Jens patches here:
> 
>   
> https://lore.kernel.org/linux-trace-kernel/[email protected]/
> 
> [ Note, I tested this with the same program from the RFC patch ]
> 
> Changes from RFC: 
> https://patch.msgid.link/[email protected]
> 
> - Remove the ioctl() like system call for a unique system call for each
>   functionality. Right now there's two functionalities:
>    1. register sframe section
>    2. unregister sframe sections
> 
> - Added taking a lock around the mtree logic in __sframe_remove_section()
>   as Sashiko mentioned that there could be races from user space
>   registering and unregistering sframe sections at the same time.

Doesn't sframe_add_section() then also need likewise?

> 
> - Removed [RFC] from subject as I believe this is more likely the way
>   this system call will be done.

> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h

> @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr, 
> struct lsm_ctx __user *
>  asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx 
> __user *ctx,
>                                     u32 size, u32 flags);
>  asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u32 
> flags);
> +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
>  
>  /*
>   * Architecture-specific system calls


> diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h

> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_SFRAME_H
> +#define _UAPI_LINUX_SFRAME_H
> +
> +struct sframe_setup {
> +     unsigned long           sframe_start;
> +     unsigned long           sframe_size;
> +     unsigned long           text_start;
> +     unsigned long           text_size;
> +};
> +
> +#endif /* _UAPI_LINUX_SFRAME_H */

> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c

> @@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
>  static int __sframe_remove_section(struct mm_struct *mm,
>                                  struct sframe_section *sec)
>  {
> -     if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> -             dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> -             return -EINVAL;
> +     scoped_guard(mmap_read_lock, mm) {

Why is a read lock sufficient? Doesn't that allow multiple readers?
How does that prevent a concurrent modification of the mm->sframe_mt?

> +             if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> +                     dbg_sec("mtree_erase failed: text=%lx\n", 
> sec->text_start);
> +                     return -EINVAL;
> +             }

Is (or why not) likewise required in sframe_add_section() for the
mtree_insert_range()?

Wasn't the reported issue that while mt_for_each() in
sframe_remove_section() there could be concurrent mtree_erase() in
__sframe_remove_section() followed by mtree_insert_range() in
sframe_add_section(), so that the mt_for_each() could get confused?

>       }
>  
>       call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
> @@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
>  
>       mtree_destroy(&mm->sframe_mt);
>  }
> +
> +/**
> + * sys_sframe_register - register an address for user space stacktrace 
> walking.
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_register, __user struct sframe_setup *, data, 
> unsigned int, size)
> +{
> +     struct sframe_setup sframe;
> +
> +     if (sizeof(sframe) != size)
> +             return -EINVAL;
> +
> +     if (copy_from_user(&sframe, data, size))
> +             return -EFAULT;
> +
> +     return sframe_add_section(sframe.sframe_start,
> +                               sframe.sframe_start + sframe.sframe_size,
> +                               sframe.text_start,
> +                               sframe.text_start + sframe.text_size);
> +}
> +
> +/**
> + * sys_sframe_unregister - unregister an sframe address
> + * @data: Structure of sframe data used to register the sframe section
> + * @size: The size of the given structure.
> + *
> + * The data->sframe_start is the only value that is used. The rest must
> + * be zero.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE2(sframe_unregister, __user struct sframe_setup *, data, 
> unsigned int, size)
> +{
> +     struct sframe_setup sframe;
> +
> +     if (sizeof(sframe) != size)
> +             return -EINVAL;
> +
> +     if (copy_from_user(&sframe, data, size))
> +             return -EFAULT;
> +
> +     if (sframe.sframe_size || sframe.text_start || sframe.text_size)
> +             return -EINVAL;
> +
> +     return sframe_remove_section(sframe.sframe_start);
> +}
Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
[email protected] / [email protected]

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: 
Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: 
Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


Reply via email to