On Fri, Oct 22, 2010 at 04:09:56PM +0800, Li Zefan wrote:
> Stephane posted a patchset to add perf_cgroup subsystem, so perf can
> be used to monitor all threads belonging to a cgroup.
> 
> But if you already mounted a cgroup hierarchy but without perf_cgroup
> and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it,
> and thus you're not able to use per-cgroup perf feature.
> 
> This patchset alleviates the pain, and then a subsytem can be bind/unbind
> to/from a hierarchy which has sub-cgroups in it.
> 
> For a cgroup subsystem to become bindable, the can_bind flag of
> struct cgroup_subsys should be set, and provide ->bind() callback
> if necessary.
> 
> But for some constraints, not all subsystems can take advantage of
> this patch. For example, we can't decide a cgroup's cpuset.mems and
> cpuset.cpus automatically, so cpuset is not bindable.

<snip>

> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 6c36750..46df5f8 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c

<snip>

> +/*
> + * cgroup_walk_herarchy - iterate through a cgroup hierarchy
> + * @process_cgroup: callback called on each cgroup in the hierarchy
> + * @data: will be passed to @process_cgroup
> + * @top_cgrp: the root cgroup of the hierarchy
> + *
> + * For such a hierarchy:
> + *        a1        c1
> + *      /         /
> + * Root - a2 - b1 - c2
> + *      \
> + *        a3
> + *
> + * The iterating order is: a1, a2, b1, c1, c2, a3. So a parent will be
> + * processed before its children.
> + */

You could just say it's a depth-first walk except we process the parent before
its children.

> +static int cgroup_walk_hierarchy(int (*process_cgroup)(struct cgroup *, void 
> *),
> +                              void *data, struct cgroup *top_cgrp)

<snip>

> +static int hierarchy_populate_dir(struct cgroup *cgrp, void *data)
> +{
> +     mutex_lock_nested(&cgrp->dentry->d_inode->i_mutex, I_MUTEX_CHILD);
> +     cgroup_populate_dir(cgrp);
> +     mutex_unlock(&cgrp->dentry->d_inode->i_mutex);
> +     return 0;
> +}
> +
>  /*
>   * Call with cgroup_mutex held. Drops reference counts on modules, including
>   * any duplicate ones that parse_cgroupfs_options took. If this function
> @@ -945,36 +1079,53 @@ static int rebind_subsystems(struct cgroupfs_root 
> *root,
>       unsigned long added_bits, removed_bits;
>       struct cgroup *cgrp = &root->top_cgroup;
>       int i;
> +     int err;
> 
>       BUG_ON(!mutex_is_locked(&cgroup_mutex));
> 
>       removed_bits = root->actual_subsys_bits & ~final_bits;
>       added_bits = final_bits & ~root->actual_subsys_bits;
> +
>       /* Check that any added subsystems are currently free */
> -     for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> -             unsigned long bit = 1UL << i;
> -             struct cgroup_subsys *ss = subsys[i];
> -             if (!(bit & added_bits))
> -                     continue;
> +     for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) {
>               /*
>                * Nobody should tell us to do a subsys that doesn't exist:
>                * parse_cgroupfs_options should catch that case and refcounts
>                * ensure that subsystems won't disappear once selected.
>                */
> -             BUG_ON(ss == NULL);
> -             if (ss->root != &rootnode) {
> +             BUG_ON(subsys[i] == NULL);
> +             if (subsys[i]->root != &rootnode) {
>                       /* Subsystem isn't free */
>                       return -EBUSY;
>               }
>       }
> 
> -     /* Currently we don't handle adding/removing subsystems when
> -      * any child cgroups exist. This is theoretically supportable
> -      * but involves complex error handling, so it's being left until
> -      * later */
> -     if (root->number_of_cgroups > 1)
> +     /* removing will be supported later */
> +     if (root->number_of_cgroups > 1 && removed_bits)
>               return -EBUSY;
> 
> +     if (root->number_of_cgroups > 1) {

Is there something wrong with the indentation here? I can't
see the closing brace for the "if (root->number_of_cgroups > 1)"
that should precede the for_each_set_bit() loop below.

> +             for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT)
> +                     if (!subsys[i]->can_bind)
> +                             return -EBUSY;

I think you could avoid the can_bind flag field entirely and do:

                        if (!subsys[i]->bind)

Also, if we're going with my "split out unbind" suggestion I think
the part here would be:

                for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT)
                        if (!subsys[i]->unbind)
                                return -EBUSY;

<snip>

Cheers,
        -Matt Helsley
_______________________________________________
Containers mailing list
contain...@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

_______________________________________________
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel

Reply via email to