On 08/27/2015 01:55 PM, Cyril Hrubis wrote:
> Hi!
>> On SLES12 (kernel 3.12.28) setns() fails for the fd opened from "user"
>> namespace. I'm getting EINVAL here. Everything seems to work fine if I
>> comment the rf |= open_ns_fd(argv[1], "user"); line above.

User namespaces are very problematic (even in upstream) and generally
disabled on enterprise distros (so far) due to their comparatively high
risk of introducing security issues.
The way of disablement differs between distros, ie. one could simply
build with CONFIG_NS_USER=n, but RHEL 7 has =y with /proc/<pid>/ns/user
missing. SLES12 might go one step further of actually showing up the
file, but rendering the functionality non-working, exactly by returning
something like EINVAL or ENOSYS.

(I'm not saying it's your case, just pointing it out.)

>>
>> Unfortunately EINVAL seems to be catch-all error for setns(), any idea
>> what is wrong here?

EINVAL is a catch-all when returning errors from the Linux kernel on
user action (call).

> 
> And it seems to be the case of:
> 
> EINVAL The caller attempted to join the user namespace in which
>        it is already a member.

While this doesn't apply to other namespaces, it does apply to the user
namespaces, if you look at SYSCALL_DEFINE2(setns, in kernel/nsproxy.c
and then at userns_install() in kernel/user_namespace.c, it becomes
obvious.

        /* Don't allow gaining capabilities by reentering
         * the same user namespace.
         */
        if (user_ns == current_user_ns())
                return -EINVAL;

> 
> Since the ns_create only creates a new network namespace the rest of the
> namespaces are inherited. At least when I change the ns_create that
> creates the handle to create new user namespace as well it can
> succesfully join it.
> 
> Why do we attempt to join all namespaces in the ns_exec? I guess that we
> will have to change it to get a list of namespaces to join the same way
> the ns_create does it.

The original idea was to simply join all NSs of the target to avoid
passing the ns type, which is a nice idea and works for any other ns,
but unfortunately not for user ns.
The simple fix for that would be readlink() on the file and if the id
(dentry) matches the one in /proc/self/ns/, don't open it.

However I fear there may be other considerations at hand, ie. user ns
interactions with other NSs - the functions (from what I can see) can
create/unshare multiple namespaces on a single process. If one specifies
ie. CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWNET, the user ns is always
created first. This theoretically means (haven't tested it myself) that
an unprivileged user can create non-user namespaces if it has UID 0 in
the new user namespace as well as a privileged user being unable to
create the other namespaces if it doesn't have the capabilities to do so
in the new user namespace (according to uid/gid maps).

This also implies possible complications of calling setns() multiple
times for multiple different namespace types - it may be necessary to
call it first on the user ns fd (or the other way around?).

In addition, there are probably going to be some problems with
capability bits when calling execve(2) after doing setns on user ns
(see capabilities(7), "Thread capability sets").

> 

Yes, user namespaces are even bigger PITA than pid namespaces. :)

Jiri

------------------------------------------------------------------------------
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

Reply via email to