Came across this while testing LXC.

1. Does ckpt_remount_proc() need to unshare() ? Or can we have the
   clone() that calls __ckpt_coordinator() clone with CLONE_NEWNS|CLONE_FS
   instead ?

   The problem with the unshare() in ckpt_remount_proc() is that it
   creates an extra level in cgroup hierarchy (see below) after restart.
   So applications expecting the cgroup hierarchy before chckpoint will
   be surprised.

2. When --mount-pty (or --mntns) is specified, do we need to unshare() 
   in the parent process ? Considering only the full-container restart
   for now (ignore self-restart and subtree restart), can we just
   specify (CLONE_NEWNS|CLONE_FS) at the time of creating the first
   restarted process ?

Here is an example (using LXC) that shows the problems I am running into
Attached is a quick hack to point out the unshare() calls I am referring
to.
   
If I create a simple container with LXC

        $ lxc-execute --name foo --rcfile lxc-macvlan.conf -- /bin/sleep 1000

It creates the following three processes:

        PID   PPID  CMD

        3350  3239  lxc-execute --name foo -- /bin/sleep 1000
        3353  3350  /usr/local/libexec/lxc-init -- /bin/sleep 1000
        3357  3353  /bin/sleep 1000

A new cgroup is created named 'foo' (which is basically a user-space
rename of the pid of the lxc-init). This cgroup is in the root cgroup
directory and has two tasks (lxc-init, sleep)

        $ cat /cgroup/foo/tasks
        3353
        3357

When I checkpoint and restart this container (using the equivalent of
--pidns --pids --mount-pty options to /bin/restart). I get three
processes:

        3434  3375  ./lxc_restart --name bar --statefile=/root/foo.ckpt
        3436  3434  /usr/local/libexec/lxc-init -- /bin/sleep 1000
        3437  3436  /bin/sleep 1000

But the directory in /cgroup referring to lxc-init is 3 levels deep:

        ls /cgroup/3434/3436/1
        cgroup.procs  freezer.state  notify_on_release  tasks

Here is the complete hierarchy created after the restart:

        $ ls -R /cgroup/3434
        /cgroup/3434:
        3436  cgroup.procs  freezer.state  notify_on_release  tasks

        /cgroup/3434/3436:
        1  cgroup.procs  freezer.state  notify_on_release  tasks

        /cgroup/3434/3436/1:
        cgroup.procs  freezer.state  notify_on_release  tasks

        $ cat /cgroup/3434/tasks
        3434

        $ cat /cgroup/3434/3436/tasks   # empty

        $ cat /cgroup/3434/3436/1/tasks
        3436
        3437

I think we get the directory /cgroup/3434 due to the following unshare()

                /* private mounts namespace ? */
                if (args->mntns && unshare(CLONE_NEWNS | CLONE_FS) < 0) {
                        ckpt_perror("unshare");
                        exit(1);
                }

And we get the "3436/1" directory due to the unshare() in ckpt_remount_proc().

Following hack seems to fix both the levels and the lxc_restart command
correctly creates just the "/cgroup/3436" (which LXC renames to "/cgroup/bar"
cgroup).

---
From: Sukadev Bhattiprolu <suka...@linux.vnet.ibm.com>
Date: Mon, 8 Mar 2010 12:03:46 -0800
Subject: [PATCH 1/1] Minimize unshare() calls

---
 restart.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/restart.c b/restart.c
index c82de21..6ac51e3 100644
--- a/restart.c
+++ b/restart.c
@@ -459,10 +459,12 @@ int app_restart(struct app_restart_args *args)
                exit(1);
 
        /* private mounts namespace ? */
+#if 0
        if (args->mntns && unshare(CLONE_NEWNS | CLONE_FS) < 0) {
                ckpt_perror("unshare");
                exit(1);
        }
+#endif
 
        /* chroot ? */
        if (args->root && chroot(args->root) < 0) {
@@ -717,10 +719,12 @@ static int ckpt_probe_child(pid_t pid, char *str)
  */
 static int ckpt_remount_proc(struct ckpt_ctx *ctx)
 {
+#if 0
        if (unshare(CLONE_NEWNS | CLONE_FS) < 0) {
                ckpt_perror("unshare");
                return -1;
        }
+#endif
        /* this is unlikely, but we don't want to fail */
        if (umount2("/proc", MNT_DETACH) < 0) {
                if (ckpt_cond_fail(ctx, CKPT_COND_MNTPROC)) {
@@ -778,6 +782,7 @@ static int ckpt_coordinator_pidns(struct ckpt_ctx *ctx)
        int copy, ret;
        genstack stk;
        void *sp;
+       unsigned long flags = SIGCHLD;
 
        ckpt_dbg("forking coordinator in new pidns\n");
 
@@ -802,7 +807,9 @@ static int ckpt_coordinator_pidns(struct ckpt_ctx *ctx)
        copy = ctx->args->copy_status;
        ctx->args->copy_status = 1;
 
-       coord_pid = clone(__ckpt_coordinator, sp, CLONE_NEWPID|SIGCHLD, ctx);
+       flags |= CLONE_NEWPID|CLONE_NEWNS|CLONE_FS;
+
+       coord_pid = clone(__ckpt_coordinator, sp, flags, ctx);
        genstack_release(stk);
        if (coord_pid < 0) {
                ckpt_perror("clone coordinator");
-- 
1.6.6.1

_______________________________________________
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

_______________________________________________
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel

Reply via email to