Hi, I would like to present a new feature I'm working on that adds OS based process isolation to Unit. For now, it implements just the basic building block of containers: Linux namespaces.
Let me know what you think, if it's useful or not, etc. To start using it, you just need to add a new "isolation" field to your app's config: { "type": "external", "executable": "/bin/app", "isolation": { "namespaces": { "user": true, "mount": true } } } The list of allowed namespaces are: user, mount, network, pid, uts, cgroup. The ipc namespace is not allowed because Unit uses shared memory to communicate with workers. In the future, if Unit could proxy general processes (and manage them also), we can allow the ipc namespace as well, them giving full isolation. Linux namespaces require CAP_SYS_ADMIN to be created if not used in conjunction with user namespace. Then, if you want to keep running Unit as an unprivileged user, you need to set "user" namespace in addition to the other flags. The PR is here (still working on it): https://github.com/nginx/unit/pull/289 When using user namespace, you can set mapping files for uid and gid ranges inside the namespace. For uid, the file is /proc/<pid>/uid_map and for gid it is /proc/<pid>/gid_map. Then, you can map an unprivileged user id in the host (parent ns) to a privileged id inside the child namespace. I added two config fields for this mappings. { "isolation": { "namespaces": { "user": true, "mount": true }, "uidmap": [ {"containerID": 0, "hostID": 1000, "size": 1} ], "gidmap": [ {"containerID": 0, "hostID": 1000, "size": 1} ], } The config is an array because you can map several ranges. For now, if you don't set a map config, Unit will use a common default (the example above, but using process current euid instead of 1000). Some distributions come with an /etc/subuid and /etc/subgid file with application's mappings. We can make unit lookup for a mapping from this file also in the future. The config is based on the OCI Spec: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#user-namespace-mappings I don't like it much, let me know if you know a better way of configuring it. The uid/gid mapping affects the user and group you pass in the application config. Then, my first question: If the user pass a "user" or "group" that's not mapped inside the container, what should we do? I would like to keep user experience very simple, but having to deal with uid/gid mappings seems a bit complex. What do you folks think about doing some auto mappings in case the user pass a user from host (without setting any mapping)? Is this confuse? If you think it's useful, what can be the next steps? I would like to add a "rootfs" field to chroot applications, also a "mounts" field to mount additional filesystems inside the rootfs (kernfs, tmpfs, procfs and also user defined bind mounts from the host filesystem). About the isolation mechanism, I did some experiments with FreeBSD jails and maybe we can deliver something useful there also. Jails are significantly more secure than Linux namespaces, and I think we can implement it relatively easy. That's all folks! _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx