On Fri, May 22, 2026 at 11:50:44AM -0700, Andrii Nakryiko wrote:
> On Thu, May 21, 2026 at 5:44 AM Jiri Olsa <[email protected]> wrote:
> >
> > Andrii reported an issue with optimized uprobes [1] that can clobber
> > redzone area with call instruction storing return address on stack
> > where user code may keep temporary data without adjusting rsp.
> >
> > Fixing this by moving the optimized uprobes on top of 10-bytes nop
> > instruction, so we can squeeze another instruction to escape the
> > redzone area before doing the call, like:
> >
> >   lea -0x80(%rsp), %rsp
> >   call tramp
> >
> > Note the lea instruction is used to adjust the rsp register without
> > changing the flags.
> >
> > We use nop10 and following transofrmation to optimized instructions
> > above and back as suggested by Peterz [2].
> >
> > Optimize path (int3_update_optimize):
> >
> >   1) Initial state after set_swbp() installed the uprobe:
> >       cc 2e 0f 1f 84 00 00 00 00 00
> >
> >      From offset 0 this is INT3 followed by the tail of the original
> >      10-byte NOP.
> >
> >   2) Trap the call slot before rewriting the NOP tail:
> >       cc 2e 0f 1f 84 [cc] 00 00 00 00
> >
> >      From offset 0 this traps on the uprobe INT3.  A thread reaching
> >      offset 5 traps on the temporary INT3 instead of seeing a partially
> >      patched call.
> >
> >   3) Rewrite the LEA tail and call displacement, keeping both INT3 bytes:
> >       cc [8d 64 24 80] cc [d0 d1 d2 d3]
> >
> >      From offset 0 and offset 5 this still traps.  The bytes between
> >      them are not executable entry points while both traps are in place.
> >
> >   4) Restore the call opcode at offset 5:
> >       cc 8d 64 24 80 [e8] d0 d1 d2 d3
> >
> >      From offset 0 this still traps.  From offset 5 the instruction is
> >      the final CALL to the uprobe trampoline.
> >
> 
> I'm sorry if I'm slow, but I don't understand why we need that second
> cc at offset 5? Isn't original nop10 processed by CPU as single
> instruction? So it will either be at ip of nop10, or at ip+10, no? If
> we trap at ip and in int3 handler +10 from there while we are
> installing lea+call, why do we need cc on byte 5?
> 
> I.e., I don't understand how CPU can end up being at ip+5 until we
> finalize lea+call sequence? Can it?

hum, so I though it's for the case when you do unoptimize+optimize,
then you can have cpu executing the previous lea and hitting the int3
on +5 offset.. but as pointed by Peter (and you) the call instruction
never changes, so now I'm not sure why we need it

jirka

Reply via email to