We've converted cgroup to kernfs so cgroup won't be intertwined with
vfs objects and locking, but there are dark areas.

Run two instances of this script concurrently:

    for ((; ;))
    {
        mount -t cgroup -o cpuacct xxx /cgroup
        umount /cgroup
    }

After a while, I saw two mount processes were stuck at retrying, because
they were waiting for a subsystem to become free, but the root associated
with this subsystem never got freed.

This can happen, if thread A is in the process of killing superblock but
hasn't called percpu_ref_kill(), and at this time thread B is mounting
the same cgroup root and finds the root in the root list and performs
percpu_ref_try_get().

To fix this, we increase the refcnt of the superblock instead of increasing
the percpu refcnt of cgroup root.

Signed-off-by: Li Zefan <lize...@huawei.com>
---

A better fix is welcome!

---
 kernel/cgroup.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bd37e8d..94e1814 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1654,7 +1654,7 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
        struct dentry *dentry;
        int ret;
        int i;
-       bool new_sb;
+       bool sb_pinned = false;
 
        /*
         * The first time anyone tries to mount a cgroup, enable the list
@@ -1735,19 +1735,21 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
                }
 
                /*
-                * A root's lifetime is governed by its root cgroup.
-                * tryget_live failure indicate that the root is being
-                * destroyed.  Wait for destruction to complete so that the
-                * subsystems are free.  We can use wait_queue for the wait
-                * but this path is super cold.  Let's just sleep for a bit
-                * and retry.
+                * This may fail for two reasons:
+                * - A concurrent mount is in process. We wait for that mount
+                    to complete.
+                * - The superblock is being destroyed. We wait for the
+                *   desctruction to complete so that the subsystems are free.
+                * We can use wait_queue for the wait but this path is super
+                * cold.  Let's just sleep for a bit and retry.
                 */
-               if (!percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
+               if (!kernfs_pin_sb(root->kf_root, NULL)) {
                        mutex_unlock(&cgroup_mutex);
                        msleep(10);
                        ret = restart_syscall();
                        goto out_free;
                }
+               sb_pinned = true;
 
                ret = 0;
                goto out_unlock;
@@ -1784,8 +1786,10 @@ out_free:
        if (ret)
                return ERR_PTR(ret);
 
-       dentry = kernfs_mount(fs_type, flags, root->kf_root, &new_sb);
-       if (IS_ERR(dentry) || !new_sb)
+       dentry = kernfs_mount(fs_type, flags, root->kf_root, NULL);
+       if (sb_pinned)
+               kernfs_drop_sb(root->kf_root, NULL);
+       if (!sb_pinned && IS_ERR(dentry))
                cgroup_put(&root->cgrp);
        return dentry;
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to