Quoting Daniel Lezcano (daniel.lezc...@free.fr): > On 10/04/2010 10:54 PM, richard -rw- weinberger wrote: > > Hi Daniel! > > > > On Mon, Oct 4, 2010 at 9:51 PM, Daniel Lezcano<daniel.lezc...@free.fr> > > wrote: > > > >> On 10/04/2010 06:18 PM, richard -rw- weinberger wrote: > >> > >>> On Sun, Oct 3, 2010 at 9:01 PM, richard -rw- weinberger > >>> <richard.weinber...@gmail.com> wrote: > >>> > >>> > >>>> I'm using lxc to run a few virtual private servers. > >>>> What capabilities are harmful and should be dropped using "lxc.cap.drop"? > >>>> > >>>> > >>> Is my question too trivial or too stupid? ;) > >>> > >>> > >> hum, not trivial at all :) > >> > >> I am not sure there is a default set of capabilities to be dropped. > >> Certainly some should be dropped like CAP_SYS_MODULE but others will depend > >> on what the user expect to do with the container and what scripts will be > >> run inside the container. > >> > >> We have certainly think about the root user inside a container, is it > >> secure > >> ? IMO, until the user namespace is not complete, it is not secure. > >> > > I thought the user namespace is complete. > > What is missing? > > > > I am not sure, but something like "who did what", so if you are root on > the host and you mount a filesystem, when you create an user namespace, > will be root inside but not the same root as the host, and you won't be > able to umount what the host's root has mounted before. I didn't > followed the discussion about this very closely so I may be wrong. > I prefer let Serge explain what is missing, he will be much more clear > than me :) > (cc'ed Serge).
Right - really 'who owns what' is what we don't track correctly in the context of user namespaces. 'who' should now be not uid, but (user_namespace, uid). In particular we don't currently have answers for that with VFS or capable() requests. A file (including /proc and /cgroup files) should be owned by (user_ns, uid), and likewise a task who you are trying to kill. So a task with credentials (init_user_ns,500),(child_user_ns1,0), in other words owned by uid 500 in the 'initial' user namespace, but root in the container, would be denied CAP_KILL to a task, or write to a file, owned by (init_user_ns,0). The capabilities part of that is actually started by a patch by Eric Biederman which is sitting in http://git.kernel.org/?p=linux/kernel/git/sergeh/linux-cr.git;a=shortlog;h=refs/heads/userns.feb16.1 (see patch http://git.kernel.org/?p=linux/kernel/git/sergeh/linux-cr.git;a=commit;h=58e3ce401f746f2865a6c9872d9205e202c2c5a2 in particular) -serge ------------------------------------------------------------------------------ Virtualization is moving to the mainstream and overtaking non-virtualized environment for deploying applications. Does it make network security easier or more difficult to achieve? Read this whitepaper to separate the two and get a better understanding. http://p.sf.net/sfu/hp-phase2-d2d _______________________________________________ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users