ping

On 29.05.2017 13:49, Kirill Tkhai wrote:
> On 27.05.2017 14:01, Eric W. Biederman wrote:
>> Kirill Tkhai <[email protected]> writes:
>>
>>> This patch prohibits pid allocation till child_reaper
>>> of pid namespace is set, and it makes possible and safe
>>> to get just unshared pid_ns from "/proc/[pid]/ns/pid_for_children"
>>> file. This may be useful to determine user_ns of such a created
>>> pid_ns, which is not possible now.
>>>
>>> It was prohibited till now, because the architecture of pid namespaces
>>> assumes child reaper is the firstly created process of the namespace,
>>> and it initializes pid_namespace::proc_mnt. Child reaper creation
>>> mustn't race with creation of another processes from this namespace,
>>> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
>>> is populated and it will get a null pointer dereference in 
>>> proc_flush_task().
>>> Also, child reaper mustn't die before processes from the namespace.
>>
>> This patch introduces the possibility that two or more processes may
>> have the same pid namespace (with no processes) as pid_ns_for_children.
>>
>> Which means you can now have a race for the first pid in alloc_pid.
>> Making it indeterminant who allocates the init process.  Which is not
>> acceptable.
>>
>> It is not acceptable on two grounds.
>> 1) It is a bogus user space semantic.  Because userspace needs to
>>    know who allocates init.
>> 2) It is horrible for maintenance becuase now the code has to be very
>>    clever to deal with a case that no one cares about.  Which is
>>    a general formula for buggy code.
> 
> We may disallow setns() if there is no child reaper created, and
> this solves all above issues. Please see v2 below, it has no problems
> you pointed.
> 
> [PATCH v2]pid_ns: Allow to get pid_for_children ns before child_reaper is 
> created
> 
> This patch prohibits setns() on a pid namespace till its child_reaper
> is set, and it makes possible and safe to get just unshared pid_ns
> from "/proc/[pid]/ns/pid_for_children" file. This may be useful
> to determine user_ns of such a created pid_ns, which is not possible now.
> 
> It was not possible till now, because the architecture of pid namespaces
> assumes child reaper is the first created process of the namespace,
> and it initializes pid_namespace::proc_mnt. Child reaper creation
> mustn't race with creation of another processes from this namespace,
> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
> is populated and it will get a null pointer dereference in proc_flush_task().
> Also, child reaper mustn't die before processes from the namespace.
> 
> The patch prevents such races. It allows to setns() on a pid namespace
> only if ns->child_reaper is already set, and this guarantees, that
> only pid namespace creator may establish child reaper.
> So, we can safely allow to get "/proc/[pid]/ns/pid_for_children"
> since it's created, and to analyse it.
> 
> v2: Don't race for child reaper creation.
> 
> Signed-off-by: Kirill Tkhai <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: "Eric W. Biederman" <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: Andy Lutomirski <[email protected]>
> CC: Serge Hallyn <[email protected]>
> CC: Michal Hocko <[email protected]>
> CC: Andrei Vagin <[email protected]>
> CC: Cyrill Gorcunov <[email protected]>
> CC: Mike Rapoport <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Peter Zijlstra <[email protected]>
> ---
>  kernel/pid_namespace.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 74a5a7255b4d..5e7b3fd0d4c2 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -385,15 +385,6 @@ static struct ns_common *pidns_for_children_get(struct 
> task_struct *task)
>       }
>       task_unlock(task);
>  
> -     if (ns) {
> -             read_lock(&tasklist_lock);
> -             if (!ns->child_reaper) {
> -                     put_pid_ns(ns);
> -                     ns = NULL;
> -             }
> -             read_unlock(&tasklist_lock);
> -     }
> -
>       return ns ? &ns->ns : NULL;
>  }
>  
> @@ -428,6 +419,15 @@ static int pidns_install(struct nsproxy *nsproxy, struct 
> ns_common *ns)
>       if (ancestor != active)
>               return -EINVAL;
>  
> +     /*
> +      * Disallow processes to use pid namespace till its
> +      * creator makes child reaper. Otherwise, several
> +      * processes race for that, and it's not clear who
> +      * establishes init.
> +      */
> +     if (!new->child_reaper)
> +             return -ESRCH;
> +
>       put_pid_ns(nsproxy->pid_ns_for_children);
>       nsproxy->pid_ns_for_children = get_pid_ns(new);
>       return 0;
> 

Reply via email to