Hi,

Thank you for the detailed response.

On Sat, Jan 18, 2025 at 6:25 PM Kees Cook <k...@kernel.org> wrote:

>
> On January 18, 2025 12:45:47 PM PST, Eyal Birger <eyal.bir...@gmail.com> 
> wrote:
> >I think the difference is that this syscall is not part of the process's
> >code - it is inserted there by another process tracing it.
>
> Well that's nothing like syscall_restart, and now I'm convinced seccomp must 
> never ignore uretprobe -- a process might want to block uretprobe!
>

I think I understand your point. But do you think this is intentional?
i.e. seccomp couldn't have been used to block uretprobes before this
syscall implementation afaict.

> So, no, sorry, this needs to be handled by the seccomp policy that is applied 
> to the process.
>

The problem we're facing is that existing workloads are breaking, and
as mentioned I'm not sure how practical it is to demand replacing a
working docker environment because of a new syscall that was added for
performance reasons.

> >So this is different than desiring to deploy a new version of a binary
> >that uses a new libc or a new syscall.
>
> Uh, no, the case I used as an example was no changes to anything except the 
> kernel. Libc noticed the available syscall, uses it, and is instantly killed 
> by the Docker seccomp policy which didn't know about that syscall.
>

That's an interesting situation and quite unexpected :) I'm glad I didn't
have to face that one in production.

> > Here the case is that there are
> >three players - the tracer running out of docker, the tracee running in 
> >docker,
> >and docker itself. All three were running fine in a specific kernel version,
> >but upgrading the kernel now crashes the traced process.
>
> If uretprobe used to work without a syscall, then that seems to be the 
> problem.

I agree.

> But I think easiest is just fixing the Docker policy. (Which is a text file 
> configuration change; no new binaries, no rebuilds!).

As far as I can tell libseccomp needs to provide support for this new
syscall and a new docker version would need to be deployed, so It's not
just a configuration change. Also the default policy which comes packed in
docker would probably need to be changed to avoid having to explicitly
provide a seccomp configuration for each deployment.

>
> >I think this syscall is different in that respect for the reasons described.
>
> I don't agree, sorry. Seccomp has a really singular and specific purpose, 
> which is explicitly *externalizing* policy. I do not want to have policy 
> within seccomp itself.
>

Understood.

> >I don't know if seccomp is behaving correctly when it blocks a kernel
> >implementation detail that isn't user created.
>
> But it is user created? Something added a uretprobe to a process who's 
> seccomp policy is not expecting it. This seems precisely by design.

I think I wasn't accurate in my wording.
The uretprobe syscall is added to the tracee by the kernel.
The tracer itself is merely requesting to attach a uretprobe bpf
function. In previous versions, this was implemented by the kernel
installing an int3 instruction, and in the new implementation the kernel
is installing a uretprobe syscall.
The "user" in this case - the tracer program - didn't deliberately install
the syscall, but anyway this is semantics.

I think I understand your point that it is regarded as "policy", only that
it creates a problem in actual deployments, where in order to be able to
run the tracer software which has been working on newer kernels a new docker
has to be deployed.

I'm trying to find a pragmatic solution to this problem, and I understand
the motivation to avoid policy in seccomp.

Alternatively, maybe this syscall implementation should be reverted?

Thanks again,
Eyal.

Reply via email to