On Mon, 14 Jul 2025 23:28:58 +0200 Jiri Olsa <olsaj...@gmail.com> wrote:
> On Mon, Jul 14, 2025 at 07:19:35PM +0900, Masami Hiramatsu wrote: > > On Mon, 14 Jul 2025 11:39:03 +0200 > > Peter Zijlstra <pet...@infradead.org> wrote: > > > > > On Mon, Jul 14, 2025 at 05:39:15PM +0900, Masami Hiramatsu wrote: > > > > > > > > + /* > > > > > + * Some of the uprobe consumers has changed sp, we can do > > > > > nothing, > > > > > + * just return via iret. > > > > > + */ > > > > > > > > Do we allow consumers to change the `sp`? It seems dangerous > > > > because consumer needs to know whether it is called from > > > > breakpoint or syscall. Note that it has to set up ax, r11 > > > > and cx on the stack correctly only if it is called from syscall, > > > > that is not compatible with breakpoint mode. > > > > > > > > > + if (regs->sp != sp) > > > > > + return regs->ax; > > > > > > > > Shouldn't we recover regs->ip? Or in this case does consumer has > > > > to change ip (== return address from trampline) too? > > > > > > > > IMHO, it should not allow to change the `sp` and `ip` directly > > > > in syscall mode. In case of kprobes, kprobe jump optimization > > > > must be disabled explicitly (e.g. setting dummy post_handler) > > > > if the handler changes `ip`. > > > > > > > > Or, even if allowing to modify `sp` and `ip`, it should be helped > > > > by this function, e.g. stack up the dummy regs->ax/r11/cx on the > > > > new stack at the new `regs->sp`. This will allow modifying those > > > > registries transparently as same as breakpoint mode. > > > > In this case, I think we just need to remove above 2 lines. > > > > > > There are two syscall return paths; the 'normal' is sysret and for that > > > you need to undo all things just right. > > > > > > The other is IRET. At which point we can have whatever state we want, > > > including modified SP. > > > > > > See arch/x86/entry/syscall_64.c:do_syscall_64() and > > > arch/x86/entry/entry_64.S:entry_SYSCALL_64 > > > > > > The IRET path should return pt_regs as is from an interrupt/exception > > > very much like INT3. > > > > OK, so SYSRET case, we need to follow; > > > > sys_uprobe -> do_syscall_64 -> entry_SYSCALL_64 -> trampoline -> retaddr > > > > But using IRET to return, we can skip returning to trampoline, > > > > sys_uprobe -> do_syscall_64 -> entry_SYSCALL_64 -> regs->ip > > the handler gets the original breakpoint address, it's set in: > > regs->ip = ax_r11_cx_ip[3] - 5; > > and at the point we do: > > /* > * Some of the uprobe consumers has changed sp, we can do nothing, > * just return via iret. > */ > if (regs->sp != sp) > return regs->ax; > > > .. regs->ip value wasn't restored for the trampoline's return address, > so iret will skip the trampoline Ah, OK. So unless we restore regs->cx = regs->ip and regs->r11 = regs->flags, it automatically use IRET. Got it. > > but perhaps we could do the extra check below to land on the next instruction? Hmm, can you clarify the required condition of changing regs in the consumers? regs->sp change need to be handled by the IRET, but other changes can be handled by trampoline. Is that correct? Thank you, > > jirka > > > --- > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c > index 043d826295a3..4318517aa852 100644 > --- a/arch/x86/kernel/uprobes.c > +++ b/arch/x86/kernel/uprobes.c > @@ -817,8 +817,12 @@ SYSCALL_DEFINE0(uprobe) > * Some of the uprobe consumers has changed sp, we can do nothing, > * just return via iret. > */ > - if (regs->sp != sp) > + if (regs->sp != sp) { > + /* skip the trampoline call */ > + if (ax_r11_cx_ip[3] - 5 == regs->ip) > + regs->ip += 5; > return regs->ax; > + } > > regs->sp -= sizeof(ax_r11_cx_ip); > -- Masami Hiramatsu (Google) <mhira...@kernel.org>