On 1/31/18 9:01 AM, Serge E. Hallyn wrote: > Quoting Srivatsa S. Bhat ([email protected]): >> From: Srivatsa S. Bhat <[email protected]> >> >> The existing patch which disallows unprivileged CLONE_NEWUSER applies >> the check for CAP_SYS_ADMIN capability on the 'init_user_ns' >> namespace, which is not entirely correct. Consider the following sequence: >> >> 1. A process with root privileges calls >> clone(child_fn, ..., CLONE_NEWUSER, ...) to create a new user namespace. >> >> 2. child_fn, now running in the newly created user namespace enjoys the >> full set of capabilities in the new user namespace, but has lost >> its capabilities in the old user namespace (init_user_ns in this >> case). >> >> 3. child_fn now calls >> clone(child_fn2, ..., CLONE_NEWUSER, ...) to create a new (nested) >> user namespace. >> >> Step 3 should have succeeded because child_fn has full privileges > > No, it should not. If the host has unprivileged_userns_clone=false, > then it should require privilege against the init_user_ns to change > or bypass that. Yes the userns was *created* by someone with that > privilege, but likely they did so precisely so that they could run > more sandboxed code.
Ah, I see. Thanks a lot for clarifying that! > >> (including CAP_SYS_ADMIN) in its user namespace, but this step fails >> because the CAP_SYS_ADMIN capability is checked against init_user_ns, >> as opposed to child_fn's user namespace. >> >> So fix this by checking for CAP_SYS_ADMIN using ns_capable() on the >> current task's user namespace. >> >> This also helps the userns07 testcase from LTP >> (testcases/kernel/containers/userns/userns07.c) to pass when running >> with root privileges. > > If you want to run that test (which checks for >=32 level nesting depth) > with privilege, surely you can just set unprivileged_userns_clone=true? > Indeed! (I referred to LTP above just to give a concrete example of a scenario that had brought my attention to your patch. But given your explanation above, I see that your patch addresses a slightly different goal/scenario from this one). Thank you! Regards, Srivatsa >> Signed-off-by: Srivatsa S. Bhat <[email protected]> >> --- >> Applies on debian kernel's master branch. >> >> ...low-unprivileged-CLONE_NEWUSER-by-default.patch | 11 +++++++---- >> 1 file changed, 7 insertions(+), 4 deletions(-) >> >> diff --git >> a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch >> >> b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch >> index 55edbc7..fc978ea 100644 >> --- >> a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch >> +++ >> b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch >> @@ -12,6 +12,9 @@ issues are found, we have a fail-safe. >> >> Signed-off-by: Serge Hallyn <[email protected]> >> [bwh: Remove unneeded binary sysctl bits] >> +[Srivatsa: Fix capability checks when running nested user namespaces >> +by using ns_capable() on the current task's user namespace.] >> +Signed-off-by: Srivatsa S. Bhat <[email protected]> >> --- >> --- a/kernel/fork.c >> +++ b/kernel/fork.c >> @@ -27,24 +30,24 @@ Signed-off-by: Serge Hallyn <[email protected]> >> >> /* >> * Minimum number of threads to boot the kernel >> -@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_stru >> +@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_struct >> *copy_process( >> if ((clone_flags & (CLONE_NEWUSER|CLONE_FS)) == >> (CLONE_NEWUSER|CLONE_FS)) >> return ERR_PTR(-EINVAL); >> >> + if ((clone_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) >> -+ if (!capable(CAP_SYS_ADMIN)) >> ++ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) >> + return ERR_PTR(-EPERM); >> + >> /* >> * Thread groups must share signals as well, and detached threads >> * can only be started up within the thread group. >> -@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, >> +@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, >> unshare_flags) >> if (unshare_flags & CLONE_NEWNS) >> unshare_flags |= CLONE_FS; >> >> + if ((unshare_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) { >> + err = -EPERM; >> -+ if (!capable(CAP_SYS_ADMIN)) >> ++ if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) >> + goto bad_unshare_out; >> + } >> +

