On 06/03/2018 23:46, Tycho Andersen wrote: > On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote: >>>> Suppose I'm writing a container manager. I want to run "mount" in the >>>> container, but I don't want to allow moun() in general and I want to >>>> emulate certain mount() actions. I can write a filter that catches >>>> mount using seccomp and calls out to the container manager for help. >>>> This isn't theoretical -- Tycho wants *exactly* this use case to be >>>> supported. >>> >>> Well, I think this use case should be handled with something like >>> LD_PRELOAD and a helper library. FYI, I did something like this: >>> https://github.com/stemjail/stemshim >> >> I doubt that will work for containers. Containers that use user >> namespaces and, for example, setuid programs aren't going to honor >> LD_PRELOAD. > > Or anything that calls syscalls directly, like go programs.
That's why the vDSO-like approach. Enforcing an access control is not the issue here, patching a buggy userland (without patching its code) is the issue isn't it? As far as I remember, the main problem is to handle file descriptors while "emulating" the kernel behavior. This can be done with a "shim" code mapped in every processes. Chrome used something like this (in a previous sandbox mechanism) as a kind of emulation (with the current seccomp-bpf ). I think it should be doable to replace the (userland) emulation code with an IPC wrapper receiving file descriptors through UNIX socket.
Description: OpenPGP digital signature