On Tue, May 27, 2014 at 2:58 PM, Seth Forshee <seth.fors...@canonical.com> wrote: > I'm posting these patches in response to the ongoing discussion of loop > devices in containers at [1]. > > The patches implement a psuedo filesystem for loop devices, which will > allow use of loop devices in containters using standard utilities. Under > normal use a loopfs mount will initially contain a single device node > for loop-control which can be used to request and release loop devices. > Any devices allocated via this node will automatically appear in that > loopfs mount (and in devtmpfs) but not in any other loopfs mounts. > CAP_SYS_ADMIN in the userns of the process which performed the mount is > allowed to perform privileged loop ioctls on these devices. > > Alternately loopfs can be mounted with the hostmount option, intended > for mounting /dev/loop in the host. This is the default mount for any > devices not created via loop-control in a loopfs mount (e.g. devices > created during driver init, devices created via /dev/loop-control, etc). > This is only available to system-wide CAP_SYS_ADMIN. > > I still have some testing to do on these patches, but they work at > minimum for simple use cases. It's possible to use an unmodified losetup > if it's new enough to know about loop-control, with a couple of caveats: > > * /dev/loop-control must be symlinked to /dev/loop/loop-control > * In some cases losetup attempts to use /dev/loopN when the device node > is at /dev/loop/N. For example, 'losetup -f disk.img' fails. > > Device nodes for loop partitions are not created in loopfs. These > devices are created by the generic block layer, and the loop driver has > no way of knowing when they are created, so some kind of hook into the > driver will be needed to support this.
This is entertaining and a bit terrifying :) ISTM that what you've done is to create a way for per-userns devices to live in a special filesystem and for userns containers to instantiate those devices by offloading all the hard work to the kernel. What if we generalized this? For example, we could add a concept of ephemeral devices. An ephemeral device is a device that can be referenced by an inode with a guarantee that the inode will *never* accidentally point to a different device [1]. Then we add a concept of the userns that owns a struct device. To make this safe, we'll need to make sure that old host udev will not see non-init-userns devices, ever. This is easy enough to do, but doing it elegantly might take some design work. To make this useful, we'll need a way for things inside user namespaces to create the device nodes. I can imagine at least three ways to make this work. a) Allow mknod on a tmpfs created by a particular userns to succeed if the targetting struct device is owned by that userns or a child and if the caller is ns_capable(CAP_MKNOD). b) Create a new filesystem that has some special ioctl or whatever to do it. c) Have real per-user-ns devtmpfs. Now, to get loop working in a userns, we need a way for the userns (or the host!) to create a new loop-control device owned by that userns and we need to tweak the loop driver to make the created loop devices be owned by the userns. (Note: I'm deliberately ignoring the fact that just doing this for loop seems to be almost entirely useless right now: you still can't mount the things.) Thoughts? [1] For example, there could be a special set of device numbers that are not reused until reboot. Ephemeral device nodes point to these devices by number. Alternatively, the inodes could keep references to the struct device. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/