On Tue, Mar 6, 2018 at 11:06 PM, Mickaël Salaün <m...@digikod.net> wrote:
> On 06/03/2018 23:46, Tycho Andersen wrote:
>> On Tue, Mar 06, 2018 at 10:33:17PM +0000, Andy Lutomirski wrote:
>>>>> Suppose I'm writing a container manager. I want to run "mount" in the
>>>>> container, but I don't want to allow moun() in general and I want to
>>>>> emulate certain mount() actions. I can write a filter that catches
>>>>> mount using seccomp and calls out to the container manager for help.
>>>>> This isn't theoretical -- Tycho wants *exactly* this use case to be
>>>> Well, I think this use case should be handled with something like
>>>> LD_PRELOAD and a helper library. FYI, I did something like this:
>>> I doubt that will work for containers. Containers that use user
>>> namespaces and, for example, setuid programs aren't going to honor
>> Or anything that calls syscalls directly, like go programs.
> That's why the vDSO-like approach. Enforcing an access control is not
> the issue here, patching a buggy userland (without patching its code) is
> the issue isn't it?
> As far as I remember, the main problem is to handle file descriptors
> while "emulating" the kernel behavior. This can be done with a "shim"
> code mapped in every processes. Chrome used something like this (in a
> previous sandbox mechanism) as a kind of emulation (with the current
> seccomp-bpf ). I think it should be doable to replace the (userland)
> emulation code with an IPC wrapper receiving file descriptors through
> UNIX socket.
Can you explain exactly what you mean by "vDSO-like"?
When a 64-bit program does a syscall, it just executes the SYSCALL
instruction. The vDSO isn't involved at all. 32-bit programs usually
go through the vDSO, but not always.
It could be possible to force-load a DSO into an entire container and
rig up seccomp to intercept all SYSCALLs not originating from the DSO
such that they merely redirect control to the DSO, but that seems