> First: {unshare -m -p -f chroot FS} will change root into that
> filesystem with unshared mount and pid namespaces.
>

This will successfully changes the root directory path of the child process,
however, the newly created mount namespace's root mount will still
point to the host's root filesystem, which is the actual root cause of the
escape (it'll become clearer below).

> Next: {mount -t proc proc /proc} will mount the procfs for that pid
> namespace. We see with {ls -l /proc/1/ns/mnt} the identity of the
> unshared mount namespace, which is different from the identity before
> chroot.
>

As the mount(8) command has copied the execution context of the container
process, it will see it's root filesystem as `FS`, so the 'procfs' will be 
mounted
on FS/proc, rightfully so. The ls command is also running with that context,
and will show the container's mount namespace ID.

> But: {nsenter -t 1 -m -- ls -l /proc/1/ns/mnt} shows the identity of
> the host mount namespace -- the outer namespace.
>
> Thus {nsenter -t 1 -m} "escapes" from the unshared namespace to the
> containing namespace. And for example: {nsenter -t 1 -m /bin/sh}
> starts a shell in the outer mount and pid namespace(s)!
>

The reason why you escaped is that when nsenter(1) calls setns(fd, CLONE_NEWNS)
, the kernel will set the root filesystem for the calling process to the 
absolute root of
the target mount namespace. And, whatever binary it forks will now be decoupled
from the container's chroot and point back to the host's root filesystem. This 
is why
you are also able to view the host's mount table or resolve paths relative to 
the host
fs while inside the container, for example, when you executed a shell with 
nsenter(8).

If you wish to completely cut ties with the VFS structure of the host, you can 
make use
of pivot_root(8). It let's you set the global root mount of the mount namespace 
and truly
isolates the mount namespace.

You can do something like this:

$ unshare --mount --pid --fork
$ mount --bind FS FS/
$ cd FS/
$ mkdir -p old_root/
$ /sbin/pivot_root . old_root/
$ cd /
$ mount -t proc proc /proc
$ umount -l old_root/
$ rmdir old_root

You should then be able to see the exact same mnt namespace ID.

$ ls -l /proc/1/ns/mnt
[...] /proc/1/ns/mnt -> 'mnt:[4026533461]'
$ nsenter --mount --target 1 -- ls -l /proc/1/ns/mnt
[...] /proc/1/ns/mnt -> 'mnt:[4026533461]'


Maybe Karel has more to say about this.

Anyways I hope this cleared up at least some of the confusion.


Christian Goeschel Ndjomouo



Reply via email to