On 2026-04-02 06:21, Andy Lutomirski wrote:
>
> I don't really agree. For quite a few years now, we've tried to make the
> exit path uniform, and we have this logic in syscall_64:
>
> /* SYSRET requires RCX == RIP and R11 == EFLAGS */
> if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
> return false; <-- fall back to IRET
>
> and this is not just an aesthetic thing -- it allows us to have deliver
> signals and implement things like sigreturn without needing to track extra
> flag bits that mean "well, actually, we're in the syscall *code* but we're
> not returning from a syscall any more". We had that a long time ago, and it
> was extremely difficult to understand and maintain.
>
> So, on current kernels and kernels going back, I dunno, 10 years (I didn't
> try to dig out the git history, but I did write much of this code...), the
> semantics have been that we return to usermode in a state that matches
> pt_regs as precisely as we can arrange. For the one case where we have a
> very longstanding divergence between entry and exit regs, we have orig_ax.
>
> So it would be at least a fairly large maintainability regression to make the
> non-FRED SYSCALL behavior modify rcx and/or r11 on exit.
>
> Now we have FRED. Sure, it would be nice to remember the entry RCX and R11,
> but if we want to avoid the footgun where the effect of SYSCALL is different
> on FRED and non-FRED hardware, then we need the context after entry completes
> to have regs->rcx == regs->rip and regs->rcx == regs->flags (or perhaps RCX
> and R11 differently poisoned, but that seems a bit silly).
>
> If we really want to have the option to fish the original rcx and r11 out
> from somewhere or perhaps to have extra-bonus-efficient many-parameter
> syscalls (I'm not sure why), then we could add orig_rcx and orig_r11. Or we
> could invent a time machine and fix SYSCALL when it first came out.
>
I certainly see what you're saying. I still don't like the idea of clobbering
registers "just because" for this reason and more...
-hpa