ping
On 29.05.2017 13:49, Kirill Tkhai wrote: > On 27.05.2017 14:01, Eric W. Biederman wrote: >> Kirill Tkhai <[email protected]> writes: >> >>> This patch prohibits pid allocation till child_reaper >>> of pid namespace is set, and it makes possible and safe >>> to get just unshared pid_ns from "/proc/[pid]/ns/pid_for_children" >>> file. This may be useful to determine user_ns of such a created >>> pid_ns, which is not possible now. >>> >>> It was prohibited till now, because the architecture of pid namespaces >>> assumes child reaper is the firstly created process of the namespace, >>> and it initializes pid_namespace::proc_mnt. Child reaper creation >>> mustn't race with creation of another processes from this namespace, >>> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt >>> is populated and it will get a null pointer dereference in >>> proc_flush_task(). >>> Also, child reaper mustn't die before processes from the namespace. >> >> This patch introduces the possibility that two or more processes may >> have the same pid namespace (with no processes) as pid_ns_for_children. >> >> Which means you can now have a race for the first pid in alloc_pid. >> Making it indeterminant who allocates the init process. Which is not >> acceptable. >> >> It is not acceptable on two grounds. >> 1) It is a bogus user space semantic. Because userspace needs to >> know who allocates init. >> 2) It is horrible for maintenance becuase now the code has to be very >> clever to deal with a case that no one cares about. Which is >> a general formula for buggy code. > > We may disallow setns() if there is no child reaper created, and > this solves all above issues. Please see v2 below, it has no problems > you pointed. > > [PATCH v2]pid_ns: Allow to get pid_for_children ns before child_reaper is > created > > This patch prohibits setns() on a pid namespace till its child_reaper > is set, and it makes possible and safe to get just unshared pid_ns > from "/proc/[pid]/ns/pid_for_children" file. This may be useful > to determine user_ns of such a created pid_ns, which is not possible now. > > It was not possible till now, because the architecture of pid namespaces > assumes child reaper is the first created process of the namespace, > and it initializes pid_namespace::proc_mnt. Child reaper creation > mustn't race with creation of another processes from this namespace, > otherwise a process with pid > 1 may die before pid_namespace::proc_mnt > is populated and it will get a null pointer dereference in proc_flush_task(). > Also, child reaper mustn't die before processes from the namespace. > > The patch prevents such races. It allows to setns() on a pid namespace > only if ns->child_reaper is already set, and this guarantees, that > only pid namespace creator may establish child reaper. > So, we can safely allow to get "/proc/[pid]/ns/pid_for_children" > since it's created, and to analyse it. > > v2: Don't race for child reaper creation. > > Signed-off-by: Kirill Tkhai <[email protected]> > CC: Andrew Morton <[email protected]> > CC: "Eric W. Biederman" <[email protected]> > CC: Oleg Nesterov <[email protected]> > CC: Andy Lutomirski <[email protected]> > CC: Serge Hallyn <[email protected]> > CC: Michal Hocko <[email protected]> > CC: Andrei Vagin <[email protected]> > CC: Cyrill Gorcunov <[email protected]> > CC: Mike Rapoport <[email protected]> > CC: Ingo Molnar <[email protected]> > CC: Peter Zijlstra <[email protected]> > --- > kernel/pid_namespace.c | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index 74a5a7255b4d..5e7b3fd0d4c2 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c > @@ -385,15 +385,6 @@ static struct ns_common *pidns_for_children_get(struct > task_struct *task) > } > task_unlock(task); > > - if (ns) { > - read_lock(&tasklist_lock); > - if (!ns->child_reaper) { > - put_pid_ns(ns); > - ns = NULL; > - } > - read_unlock(&tasklist_lock); > - } > - > return ns ? &ns->ns : NULL; > } > > @@ -428,6 +419,15 @@ static int pidns_install(struct nsproxy *nsproxy, struct > ns_common *ns) > if (ancestor != active) > return -EINVAL; > > + /* > + * Disallow processes to use pid namespace till its > + * creator makes child reaper. Otherwise, several > + * processes race for that, and it's not clear who > + * establishes init. > + */ > + if (!new->child_reaper) > + return -ESRCH; > + > put_pid_ns(nsproxy->pid_ns_for_children); > nsproxy->pid_ns_for_children = get_pid_ns(new); > return 0; >

