Quoting Oren Laadan ([email protected]):
> The main challenge with restoring the pgid of tasks is that the
> original "owner" (the process with that pid) might have exited
> already. I call these "ghost" pgids. 'mktree' does create these
> processes, but they then exit without participating in the restart.
> 
> To solve this, this patch introduces a RESTART_GHOST flag, used for
> "ghost" owners that are created only to pass their pgid to other
> tasks. ('mktree' now makes them call restart(2) instead of exiting).
> 
> When a "ghost" task calls restart(2), it will be placed on a wait
> queue until the restart completes and then exit. This guarantees that
> the pgid that it owns remains available for all (regular) restarting
> tasks for when they need it.
> 
> Regular tasks perform the restart as before, except that they also
> now restore their old pgrp, which is guaranteed to exist.
> 
> Changelog [v1]:
>   - Verify that pgid owner is a thread-group-leader.
>   - Handle the case of pgid/sid == 0 using root's parent pid-ns
> 
> Signed-off-by: Oren Laadan <[email protected]>
> ---
>  checkpoint/process.c             |  106 ++++++++++++++++++++++++-
>  checkpoint/restart.c             |  158 
> ++++++++++++++++++++++++++------------
>  checkpoint/sys.c                 |    3 +-
>  include/linux/checkpoint.h       |   11 ++-
>  include/linux/checkpoint_hdr.h   |    3 +
>  include/linux/checkpoint_types.h |    6 +-
>  6 files changed, 230 insertions(+), 57 deletions(-)
> 
> diff --git a/checkpoint/process.c b/checkpoint/process.c
> index 40b2580..5d6bdb9 100644
> --- a/checkpoint/process.c
> +++ b/checkpoint/process.c
> @@ -23,6 +23,57 @@
>  #include <linux/syscalls.h>
> 
> 
> +pid_t ckpt_pid_nr(struct ckpt_ctx *ctx, struct pid *pid)
> +{
> +     return pid ? pid_nr_ns(pid, ctx->root_nsproxy->pid_ns) : CKPT_PID_NULL;
> +}
> +
> +/* must be called with tasklist_lock or rcu_read_lock() held */
> +struct pid *_ckpt_find_pgrp(struct ckpt_ctx *ctx, pid_t pgid)
> +{
> +     struct task_struct *p;
> +     struct pid *pgrp;
> +
> +     if (pgid == 0) {
> +             /*
> +              * At checkpoint the pgid owner lived in an ancestor
> +              * pid-ns. The best we can do (sanely and safely) is
> +              * to examine the parent of this restart's root: if in
> +              * a distinct pid-ns, use its pgrp; otherwise fail.
> +              */
> +             p = ctx->root_task->real_parent;
> +             if (p->nsproxy->pid_ns == current->nsproxy->pid_ns)
> +                     return NULL;
> +             pgrp = task_pgrp(p);
> +     } else {
> +             /*
> +              * Find the owner process of this pgid (it must exist
> +              * if pgrp exists). It must be a thread group leader.
> +              */
> +             pgrp = find_vpid(pgid);
> +             p = pid_task(pgrp, PIDTYPE_PID);
> +             if (!p || !thread_group_leader(p))
> +                     return NULL;
> +             /*
> +              * The pgrp must "belong" to our restart tree (compare
> +              * p->checkpoint_ctx to ours). This prevents malicious
> +              * input from (guessing and) using unrelated pgrps. If
> +              * the owner is dead, then it doesn't have a context,
> +              * so instead compare against its (real) parent's.
> +              */
> +             if (p->exit_state == EXIT_ZOMBIE)
> +                     p = p->real_parent;
> +             if (p->checkpoint_ctx != ctx)
> +                     return NULL;
> +     }
> +
> +     if (task_session(current) != task_session(p))
> +             return NULL;
> +
> +     return pgrp;
> +}
> +
> +
>  #ifdef CONFIG_FUTEX
>  static void save_task_robust_futex_list(struct ckpt_hdr_task *h,
>                                       struct task_struct *t)
> @@ -94,8 +145,8 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, 
> struct task_struct *t)
>               h->exit_signal = t->exit_signal;
>               h->pdeath_signal = t->pdeath_signal;
> 
> -             h->set_child_tid = t->set_child_tid;
> -             h->clear_child_tid = t->clear_child_tid;
> +             h->set_child_tid = (unsigned long) t->set_child_tid;

note that set_child_tid is an int (signed), not a long.  Same on
x86, but not on other arches.  Shouldn't lose info so could be worse.

On the whole,

Acked-by: Serge Hallyn <[email protected]>

-serge
_______________________________________________
Containers mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/containers

_______________________________________________
Devel mailing list
[email protected]
https://openvz.org/mailman/listinfo/devel

Reply via email to