Hi, Thank you for the detailed response.
On Sat, Jan 18, 2025 at 6:25 PM Kees Cook <k...@kernel.org> wrote: > > On January 18, 2025 12:45:47 PM PST, Eyal Birger <eyal.bir...@gmail.com> > wrote: > >I think the difference is that this syscall is not part of the process's > >code - it is inserted there by another process tracing it. > > Well that's nothing like syscall_restart, and now I'm convinced seccomp must > never ignore uretprobe -- a process might want to block uretprobe! > I think I understand your point. But do you think this is intentional? i.e. seccomp couldn't have been used to block uretprobes before this syscall implementation afaict. > So, no, sorry, this needs to be handled by the seccomp policy that is applied > to the process. > The problem we're facing is that existing workloads are breaking, and as mentioned I'm not sure how practical it is to demand replacing a working docker environment because of a new syscall that was added for performance reasons. > >So this is different than desiring to deploy a new version of a binary > >that uses a new libc or a new syscall. > > Uh, no, the case I used as an example was no changes to anything except the > kernel. Libc noticed the available syscall, uses it, and is instantly killed > by the Docker seccomp policy which didn't know about that syscall. > That's an interesting situation and quite unexpected :) I'm glad I didn't have to face that one in production. > > Here the case is that there are > >three players - the tracer running out of docker, the tracee running in > >docker, > >and docker itself. All three were running fine in a specific kernel version, > >but upgrading the kernel now crashes the traced process. > > If uretprobe used to work without a syscall, then that seems to be the > problem. I agree. > But I think easiest is just fixing the Docker policy. (Which is a text file > configuration change; no new binaries, no rebuilds!). As far as I can tell libseccomp needs to provide support for this new syscall and a new docker version would need to be deployed, so It's not just a configuration change. Also the default policy which comes packed in docker would probably need to be changed to avoid having to explicitly provide a seccomp configuration for each deployment. > > >I think this syscall is different in that respect for the reasons described. > > I don't agree, sorry. Seccomp has a really singular and specific purpose, > which is explicitly *externalizing* policy. I do not want to have policy > within seccomp itself. > Understood. > >I don't know if seccomp is behaving correctly when it blocks a kernel > >implementation detail that isn't user created. > > But it is user created? Something added a uretprobe to a process who's > seccomp policy is not expecting it. This seems precisely by design. I think I wasn't accurate in my wording. The uretprobe syscall is added to the tracee by the kernel. The tracer itself is merely requesting to attach a uretprobe bpf function. In previous versions, this was implemented by the kernel installing an int3 instruction, and in the new implementation the kernel is installing a uretprobe syscall. The "user" in this case - the tracer program - didn't deliberately install the syscall, but anyway this is semantics. I think I understand your point that it is regarded as "policy", only that it creates a problem in actual deployments, where in order to be able to run the tracer software which has been working on newer kernels a new docker has to be deployed. I'm trying to find a pragmatic solution to this problem, and I understand the motivation to avoid policy in seccomp. Alternatively, maybe this syscall implementation should be reverted? Thanks again, Eyal.