On Tue, May 26, 2026 at 11:19:44AM +0200, Peter Zijlstra wrote: > On Fri, May 22, 2026 at 11:19:17PM +0200, Jiri Olsa wrote: > > On Fri, May 22, 2026 at 11:50:44AM -0700, Andrii Nakryiko wrote: > > > On Thu, May 21, 2026 at 5:44 AM Jiri Olsa <[email protected]> wrote: > > > > > > > > Andrii reported an issue with optimized uprobes [1] that can clobber > > > > redzone area with call instruction storing return address on stack > > > > where user code may keep temporary data without adjusting rsp. > > > > > > > > Fixing this by moving the optimized uprobes on top of 10-bytes nop > > > > instruction, so we can squeeze another instruction to escape the > > > > redzone area before doing the call, like: > > > > > > > > lea -0x80(%rsp), %rsp > > > > call tramp > > > > > > > > Note the lea instruction is used to adjust the rsp register without > > > > changing the flags. > > > > > > > > We use nop10 and following transofrmation to optimized instructions > > > > above and back as suggested by Peterz [2]. > > > > > > > > Optimize path (int3_update_optimize): > > > > > > > > 1) Initial state after set_swbp() installed the uprobe: > > > > cc 2e 0f 1f 84 00 00 00 00 00 > > > > > > > > From offset 0 this is INT3 followed by the tail of the original > > > > 10-byte NOP. > > > > > > > > 2) Trap the call slot before rewriting the NOP tail: > > > > cc 2e 0f 1f 84 [cc] 00 00 00 00 > > > > > > > > From offset 0 this traps on the uprobe INT3. A thread reaching > > > > offset 5 traps on the temporary INT3 instead of seeing a partially > > > > patched call. > > > > > > > > 3) Rewrite the LEA tail and call displacement, keeping both INT3 > > > > bytes: > > > > cc [8d 64 24 80] cc [d0 d1 d2 d3] > > > > > > > > From offset 0 and offset 5 this still traps. The bytes between > > > > them are not executable entry points while both traps are in place. > > > > > > > > 4) Restore the call opcode at offset 5: > > > > cc 8d 64 24 80 [e8] d0 d1 d2 d3 > > > > > > > > From offset 0 this still traps. From offset 5 the instruction is > > > > the final CALL to the uprobe trampoline. > > > > > > > > > > I'm sorry if I'm slow, but I don't understand why we need that second > > > cc at offset 5? Isn't original nop10 processed by CPU as single > > > instruction? So it will either be at ip of nop10, or at ip+10, no? If > > > we trap at ip and in int3 handler +10 from there while we are > > > installing lea+call, why do we need cc on byte 5? > > > > > > I.e., I don't understand how CPU can end up being at ip+5 until we > > > finalize lea+call sequence? Can it? > > > > hum, so I though it's for the case when you do unoptimize+optimize, > > then you can have cpu executing the previous lea and hitting the int3 > > on +5 offset.. but as pointed by Peter (and you) the call instruction > > never changes, so now I'm not sure why we need it > > So I missed you did the second INT3 in my initial reading. > > That second INT3 is absolutely required *IF* the CALL can ever change. > However Andrii pointed out that once the CALL is written, it will always > be the same CALL -- there is but the one trampoline, it doesn't move. > > Therefore, the second INT3 is not strictly required. > > Does this clarify?
yes, will change that in next version thanks, jirka
