On Fri, May 15, 2026 at 01:31:31PM -0700, Andrii Nakryiko wrote:
> On Thu, May 14, 2026 at 6:53 AM Jiri Olsa <[email protected]> wrote:
> >
> > Andrii reported an issue with optimized uprobes [1] that can clobber
> > redzone area with call instruction storing return address on stack
> > where user code may keep temporary data without adjusting rsp.
> >
> > Fixing this by moving the optimized uprobes on top of 10-bytes nop
> > instruction, so we can squeeze another instruction to escape the
> > redzone area before doing the call, like:
> >
> >   lea -0x80(%rsp), %rsp
> >   call tramp
> >
> > Note the lea instruction is used to adjust the rsp register without
> > changing the flags.
> 
> I think it should be very loudly explained that we can't go back to
> nop10 and have to do short jump over patched sequence (and why).

there's comment in swbp_unoptimize:

         * We have optimized nop10 (lea, call), changing it to 'jmp rel8' to
         * end of the 10-byte slot instead of restoring the original nop10,
         * because we could have thread already inside lea instruction.

I'll add it in here as well

> 
> >
> > The optimized uprobe performance stays the same:
> >
> >         uprobe-nop     :    3.129 ± 0.013M/s
> >         uprobe-push    :    3.045 ± 0.006M/s
> >         uprobe-ret     :    1.095 ± 0.004M/s
> >   -->   uprobe-nop10   :    7.170 ± 0.020M/s
> >         uretprobe-nop  :    2.143 ± 0.021M/s
> >         uretprobe-push :    2.090 ± 0.000M/s
> >         uretprobe-ret  :    0.942 ± 0.000M/s
> >   -->   uretprobe-nop10:    3.381 ± 0.003M/s
> >         usdt-nop       :    3.245 ± 0.004M/s
> >   -->   usdt-nop10     :    7.256 ± 0.023M/s
> >
> > [1] https://lore.kernel.org/bpf/[email protected]/
> > Reported-by: Andrii Nakryiko <[email protected]>
> > Closes: 
> > https://lore.kernel.org/bpf/[email protected]/
> > Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> >  arch/x86/kernel/uprobes.c | 121 +++++++++++++++++++++++++++-----------
> >  1 file changed, 86 insertions(+), 35 deletions(-)
> >
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index ebb1baf1eb1d..f7c4101a4039 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -636,9 +636,21 @@ struct uprobe_trampoline {
> >         unsigned long           vaddr;
> >  };
> >
> > +#define LEA_INSN_SIZE          5
> > +#define OPT_INSN_SIZE          (LEA_INSN_SIZE + CALL_INSN_SIZE)
> > +#define OPT_JMP8_OFFSET                (OPT_INSN_SIZE - JMP8_INSN_SIZE)
> > +#define REDZONE_SIZE           0x80
> > +
> > +static const u8 lea_rsp[] = { 0x48, 0x8d, 0x64, 0x24, 0x80 };
> > +
> > +static bool is_lea_insn(const uprobe_opcode_t *insn)
> > +{
> > +       return !memcmp(insn, lea_rsp, LEA_INSN_SIZE);
> > +}
> > +
> >  static bool is_reachable_by_call(unsigned long vtramp, unsigned long vaddr)
> >  {
> > -       long delta = (long)(vaddr + 5 - vtramp);
> > +       long delta = (long)(vaddr + OPT_INSN_SIZE - vtramp);
> >
> >         return delta >= INT_MIN && delta <= INT_MAX;
> >  }
> > @@ -651,7 +663,7 @@ static unsigned long find_nearest_trampoline(unsigned 
> > long vaddr)
> >         };
> >         unsigned long low_limit, high_limit;
> >         unsigned long low_tramp, high_tramp;
> > -       unsigned long call_end = vaddr + 5;
> > +       unsigned long call_end = vaddr + OPT_INSN_SIZE;
> >
> >         if (check_add_overflow(call_end, INT_MIN, &low_limit))
> >                 low_limit = PAGE_SIZE;
> > @@ -826,8 +838,8 @@ SYSCALL_DEFINE0(uprobe)
> 
> should we change -ENXIO to -EPROTO or some other distinct error code,
> so libbpf can avoid using nop5 attachment on kernels new enough to
> support nop5 optimization, but old enough to not do this properly with
> nop10?

right, I'll take that change as well

thanks,
jirka

Reply via email to