Re: [gentoo-user] Can I use containers?

Grant Taylor Sat, 18 May 2019 21:39:33 -0700

On 5/18/19 5:49 PM, Rich Freeman wrote:

I'd be interested if there are other scripts people have put outthere, but I agree that most of the container solutions on Linuxare overly-complex.

Here's what I use for some networking, which probably qualifies asextremely light weight ""containers.


Prerequisite:  Create a place for the name spaces to anchor:

   # Create the directories to contain the *NS mount points.
   sudo mkdir -p /run/{mount,net,uts}ns

You can use any path that you want. — I do a lot with iproute2'snetwork namespaces (which is where this evolved from), which use/run/netns/$NetNSname. So I used that as a pattern for the other typesof namespaces. Adjust as you want. — What I'm doing is interoperablewith iproute2's netns command.


Per ""Container:  Create the ""Containers mount points:

   # Create the *NS mount points
   sudo touch /run/{mount,net,uts}ns/$ContainerName

Start the actual namespaces:

   # Spawn the lab# NetNSs.

unshare --mount=/run/mountns/$ContainerName--net=/run/netns/$ContainerName --uts=/run/utsns/$ContainerName /bin/true

Note: The namespaces don't die when true exits because they areassociated with a mount point.


Tweak the namespaces:

   # Set the lab# NetNS's hostname.

nsenter --mount=/run/mountns/$ContainerName--net=/run/netns/$ContainerName --uts=/run/utsns/$ContainerName/bin/hostname $ContainerName

I reuse this command calling different binaries any time I want to dosomething in the ""container. Calling /bin/bash (et al.) enters thecontainer.

I've created a wrapper script (nsenter.wrapper) that passes the properparameters to nsenter. I've then sym-linked the container name to thensenter.wrapper script. This means that I can run "$ContainerName$Command" or simply enter the container with $ContainerName. (Thescript checks the number of parameters and assumes /bin/bash if nocommand is specified.

I think it's ultimately extremely trivial to have a ""container(glorified collection of name spaces) to do things I want with virtuallyzero disk space. Ok, ok, maybe 1 or 2 kB for the script & links.

Note: Since I'm using the mount name space, I can have a completelydifferent mount tree inside the ""container than I have outside thecontainer / on the host. I'm not currently doing that, but it'spossible to change things as desired.

I personally use nspawn, which is actually pretty minimal, but itdepends on systemd, which I'm sure many would argue is overly complex.:) However, if you are running systemd you can basically do aone-liner that requires zero setup to turn a chroot into a container.

As much as I might not like systemd, if you have it, and it reliablydoes what you want, then I see no reason to /not/ use it. Justacknowledge it as a dependency on your solution, which you have done.So I think we're cool.

On to the original questions about mounts:
In general you can mount stuff in containers without issue. There aretwo ways to go about it. One is to mount something on the host andbind-mount it into the container, typically at launch time. The otheris to give the container the necessary capabilities so that it cando its own mounting (typically containers are not given the necessarycapabilities, so mounting will fail even as root inside the container).

Given that one of the uses of containers is security isolation (such asit is), I feel like giving the container the ability to mount things isless than a stellar idea. But to each his / her own.

I believe the reason the wiki says to be careful with mounts has moreto do with UID/GID mapping. As you are using nfs this is already anissue you're probably dealing with. You're probably aware that runningnfs with multiple hosts with unsynchronized passwd/group files canbe tricky, because linux (and unix in general) works with UIDs/GIDs,and not really directly with names,

That's true for NFS v1-3. But NFS v4 changes that. NFS v4 actuallyuses user names & group names and has a daemon that runs on the client &server to translate things as necessary.

so if you're doing something with one UID on one host and with adifferent UID on another host you might get unexpected permissionsbehavior.

Yep. You need to do /something/ to account for this. Be it manuallymanage UID & GID across things, or use something like NFSv4'ssynchronization mechanism.

In a nutshell the same thing can happen with containers, or forthat matter with chroots.


I mostly agree.  However, user namespaces can nullify this.

I've not dabbled with user namespaces yet, but my understanding is thatthey can have completely different UIDs & GIDs inside the user namespacethan outside of it. It's my understanding that UID 0 / GID 0 inside auser namespace can be mapped to UID 12345 / GID 23456 outside of theuser namespace. Refer to nsenter / unshare man pages for more details.

If you have identical passwd/group files it should be a non-issue.

Point of order: The files don't need to be identical. The UIDs & GIDsneed to be managed if you aren't using something like user namespaces.So it's perfectly valid to have a text file that is used to coordinateUIDs & GIDs somewhere and then use those in passw/shadow group/gshadowfiles.

However, if you want to do mapping with unprivileged containersyou have to be careful with mounts as they might not get translatedproperly. Using completely different UIDs in a container is theirsuggested solution, which is fine as long as the actual containerfilesystem isn't shared with anything else.

I conceptually agree. However I think mount namespaces combined withuser namespaces muddy the water. Again, refer to the nsenter / unshareman pages and what they refer to.

nsenter has an option for sharing something between mount namespaces. Ihave no idea what it does, much less how it does it. I suspect that thekernel mounts it once (maybe not visible from anywhere else) and thenbind-mounts it to multiple locations for visibility / access.

That tends to be the case anyway when you're using containerimplementations that do a lot of fancy image management. If you'redoing something very minimal and just using a path/chroot on the hostas your container then you need to be mindful of your UIDs/GIDs ifyou go accessing anything from the host directly.


UID & GID management is important.  /Something/ should be doing it.

The other thing I'd be careful with is mounting physical devices inmore than one place. Since you're actually sharing a kernel I suspectlinux will "do the right thing" if you mount an ext4 on /dev/sda2 ontwo different containers, but I've never tried it (and again doingthat requires giving containers access to even see sda2 because theyprobably won't see physical devices by default).

Seeing as how the containers are running under the same kernel, there isno actual need for the file system to be mounted multiple times.Instead the kernel would mount it and present it, much like a bindmount, to multiple containers for access.

Think along the lines of opening and working with a file system as aseparate process from where it's presented for access. Conceptually notthat dissimilar to a hard link that has multiple representations of afile in multiple locations on the same file system. (It's not a perfectanalogy, but I hope that makes sense.)

In a VM environment you definitely can't do this, because the VMsare completely isolated at the kernel level and having two differentkernels having dirty buffers on the same physical device is goingto kill any filesystem that isn't designed to be clustered.

Technically, you can usually get away with doing this. But the mountsneed to be read-only. But I STRONGLY suggest that you NOT do this to anon-cluster aware file system.

I have colleagues that supported systems RO mounting an Ext file systemthis way. It worked okay when it was used as a RO library. The problemwas when they made changes in the one with RW access. They needed tounmount and remount all the RO clients to see the updates. It was notgraceful and we advised that they stop doing that. But it did work fortheir needs. They used it akin to a bit (~TB) CD-ROM.

In a container environment the two containers aren't really isolatedat the actual physical filesystem level since they share the kernel,

I think mount namespaces muddy this water. Yes, it's the same kernel,but the containers don't have the same file systems exposed to thecontainer.

so I think you'd be fine but I'd really want to test or do someresearch before relying on it.


Yes, test.

But make sure you have a vague understanding of what's actuallyhappening behind the scenes. I find that tremendously helpful inknowing what can and can't be done, as well as why.

In any case, the more typical solution is to just mount everything onthe host and then bind-mount it into the container. So, you couldmount the nfs in /mnt and then bind-mount that into your container.There is really no performance hit and it should work fine withoutgiving the container a bunch of capabilities.

I think there /is/ a performance hit. It's just so /minimal/ that it'seffectively non-existent. Every additional line of code in the paththat must be traversed does take CPU cycles.

Re: [gentoo-user] Can I use containers?

Reply via email to