On Fri, 22 May 2026 11:43:06 +0200
Jens Remus <[email protected]> wrote:

> On 5/22/2026 12:35 AM, Steven Rostedt wrote:
> > From: Steven Rostedt <[email protected]>
> > 
> > Add system calls to register and unregister sframes that can be used by
> > dynamic linkers to tell the kernel where the sframe section is in memory
> > for libraries it loads.  
> 
> Why two separate system calls?  Can't that be one single stacktracectl?
> Could they at least be non-sframe specific, e.g. stracktrace_register
> and stracktrace_unregister, so that if one would implement e.g. unwind
> user dwarf/eh_frame in the future one could pass ehframe_start and
> ehframe_end in addition to sframe_start and sframe_end?

Talking with everyone at LSF/MM/BPF the consensus was to avoid an ioctl
like system call. Everyone hates them. They told me that a system call
should do one thing. They wanted a separate system call to register and to
unregister.

Note this also helps to see what the user is doing via monitoring via
ftrace, strace, and security wise via LSMs and seccomp.

> 
> > 
> > Both system calls take a pointer to a new structure:
> > 
> >   struct sframe_setup {
> >     unsigned long           sframe_start;
> >     unsigned long           sframe_size;
> >     unsigned long           text_start;
> >     unsigned long           text_size;
> >   };
> > 
> > and a size of the passed in structure. If the system call needs to be
> > extended, then the structure could be changed and the size of that
> > structure will tell the kernel that it is the new version. If the kernel
> > does not recognize the structure size, it will return -EINVAL.
> > 
> >   sframe_start - The virtual address of the sframe section
> >   sframe_size  - The length of the sframe section
> >   text_start   - the text section the sframe represents
> >   test_size    - the length of the section
> > 
> > If other stack tracing functionality is added, it will require a new
> > system call.
> > 
> > The unregister only needs the sframe_start and requires all the rest of
> > the fields to be 0. In the future, if more can be done, then user space
> > can update the other values and check the return code to see if the kernel
> > supports it.
> > 
> > Signed-off-by: Steven Rostedt <[email protected]>
> > ---
> > 
> > Based on top of Jens patches here:
> > 
> >   
> > https://lore.kernel.org/linux-trace-kernel/[email protected]/
> > 
> > [ Note, I tested this with the same program from the RFC patch ]
> > 
> > Changes from RFC: 
> > https://patch.msgid.link/[email protected]
> > 
> > - Remove the ioctl() like system call for a unique system call for each
> >   functionality. Right now there's two functionalities:
> >    1. register sframe section
> >    2. unregister sframe sections
> > 
> > - Added taking a lock around the mtree logic in __sframe_remove_section()
> >   as Sashiko mentioned that there could be races from user space
> >   registering and unregistering sframe sections at the same time.  
> 
> Doesn't sframe_add_section() then also need likewise?

Ah, I saw the lock grabbed on the vma lookup. It should also be done for the
mtree_insert_range(). Thanks, will fix.

> 
> > 
> > - Removed [RFC] from subject as I believe this is more likely the way
> >   this system call will be done.  
> 
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h  
> 
> > @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int 
> > attr, struct lsm_ctx __user *
> >  asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx 
> > __user *ctx,
> >                                   u32 size, u32 flags);
> >  asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, 
> > u32 flags);
> > +asmlinkage long sys_sframe_register(void *data, unsigned int size);
> > +asmlinkage long sys_sframe_unregister(void *data, unsigned int size);
> >  
> >  /*
> >   * Architecture-specific system calls  
> 
> 
> > diff --git a/include/uapi/linux/sframe.h b/include/uapi/linux/sframe.h  
> 
> > @@ -0,0 +1,12 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > +#ifndef _UAPI_LINUX_SFRAME_H
> > +#define _UAPI_LINUX_SFRAME_H
> > +
> > +struct sframe_setup {
> > +   unsigned long           sframe_start;
> > +   unsigned long           sframe_size;
> > +   unsigned long           text_start;
> > +   unsigned long           text_size;
> > +};
> > +
> > +#endif /* _UAPI_LINUX_SFRAME_H */  
> 
> > diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c  
> 
> > @@ -842,9 +844,11 @@ static void sframe_free_srcu(struct rcu_head *rcu)
> >  static int __sframe_remove_section(struct mm_struct *mm,
> >                                struct sframe_section *sec)
> >  {
> > -   if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> > -           dbg_sec("mtree_erase failed: text=%lx\n", sec->text_start);
> > -           return -EINVAL;
> > +   scoped_guard(mmap_read_lock, mm) {  
> 
> Why is a read lock sufficient? Doesn't that allow multiple readers?
> How does that prevent a concurrent modification of the mm->sframe_mt?

That was a cut and paste error. I meant to change it to a write lock, but
got distracted :-p Thanks, will fix.

> 
> > +           if (!mtree_erase(&mm->sframe_mt, sec->text_start)) {
> > +                   dbg_sec("mtree_erase failed: text=%lx\n", 
> > sec->text_start);
> > +                   return -EINVAL;
> > +           }  
> 
> Is (or why not) likewise required in sframe_add_section() for the
> mtree_insert_range()?
> 
> Wasn't the reported issue that while mt_for_each() in
> sframe_remove_section() there could be concurrent mtree_erase() in
> __sframe_remove_section() followed by mtree_insert_range() in
> sframe_add_section(), so that the mt_for_each() could get confused?

I'll take a closer look. But let me fix the obvious bugs first.

-- Steve

> 
> >     }
> >  
> >     call_srcu(&sframe_srcu, &sec->rcu, sframe_free_srcu);
> > @@ -936,3 +940,56 @@ void sframe_free_mm(struct mm_struct *mm)
> >  
> >     mtree_destroy(&mm->sframe_mt);
> >  }

Reply via email to