On Thu, Jul 02, 2026 at 02:01:02PM +0200, Sven Schnelle wrote: > Michal Suchánek <[email protected]> writes: > > > On Thu, Jul 02, 2026 at 10:12:35AM +0200, Sven Schnelle wrote: > >> Michal Suchánek <[email protected]> writes: > >> > >> > The return value of syscall_enter_from_user_mode is used both for the > >> > adjusted syscall number and the indicator that a syscall should be > >> > skipped. > >> > > >> > As seccomp can be invoked on any syscall, including invalid ones this > >> > somewhat undermines seccomp. > >> > > >> > While the seccomp variants that terminate the process do not need to > >> > care about this for the filter that sets the syscall return value this > >> > disctinction is required. > >> > > >> > Pass the syscall number as a pointer to the inline entry functions, and > >> > use the return value exclusively for the indication that the syscall is > >> > already handled. > >> > > >> > This should avoid the need for the s390 PIF_SYSCALL_RET_SET which is the > >> > workaround for exactly this deficiency. > >> > >> I'm not sure whether PIF_SYSCALL_RET_SET can be removed - the syscall > >> return might still get set by PTRACE_SET_SYSCALL_INFO when the tracee is > >> stopped. This might be a positive number which can't be distinguished > >> from a syscall number. But maybe i'm missing something? It's been quite > >> a while since I touched all that ptrace stuff. > > > > When the syscall return value is set (in the registers) the return value > > which is also the modified syscall number is set to -1 indicating the > > syscall was handled. At least that's how the API is described. > > > > So yes, if the syscall number range is restricted or the syscall number > > is returned through a path different from the function return value the > > flag should not be needed in the entry path because the case can be > > detected through the return value alone. > > I'm still failing to see how this would work without an additional > flag. Assume a program (the tracee) is stopped because of a syscall > entry. The tracer then decides to skip the syscall and changes > regs->gpr2 (which contains either the syscall number or return value) > to contain 42. When the tracer than restarts the syscall, how does > do_syscall() know that gpr2 is now a return value and not a syscall number?
Because then the return value from the syscall_enter_from_user_mode machinery would be -1 indicating the syscall should be skipped. That is how the return value of syscall_enter_from_user_mode is documented, I did not verify that it actually works that way for the tracing case on s390. So long as it is clarified that -1 is not a syscall number or the syscall number is retuned elsewhere there is no doubt, the -1 indicates already handled syscall without the need for an additional flag. Thanks Michal
