On Tue, Mar 06, 2018 at 05:36:37PM +0800, linxiu...@gmail.com wrote:
> From: "leilei.lin" <leilei....@alibaba-inc.com>
> 
> Do not install cgroup event into the CPU context and schedule it
> if the cgroup is not running on this CPU

OK, so far so good, this explains the bit in
__perf_install_in_context().

> While there is no task of cgroup running specified CPU, current
> kernel still install cgroup event into CPU context that causes
> another cgroup event can't be installed into this CPU.
> 
> This patch prevent scheduling events at __perf_install_in_context()
> and installing events at list_update_cgroup_event() if cgroup isn't
> running on specified CPU.

This bit doesn't make sense, you don't in fact avoid anything in
list_update_cgroup_event(), you do more, not less.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4df5b69..f3ffa70 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -933,31 +933,45 @@ list_update_cgroup_event(struct perf_event *event,
>  {
>       struct perf_cpu_context *cpuctx;
>       struct list_head *cpuctx_entry;
> +     struct perf_cgroup *cgrp;
>  
>       if (!is_cgroup_event(event))
>               return;
>  
>       /*
>        * Because cgroup events are always per-cpu events,
>        * this will always be called from the right CPU.
>        */
>       cpuctx = __get_cpu_context(ctx);
> +     cgrp = perf_cgroup_from_task(current, ctx);
> +
> +     /*
> +      * if only the cgroup is running on this cpu
> +      * and cpuctx->cgrp == NULL (otherwise it would've
> +      * been set with running cgroup), we put this cgroup
> +      * into cpu context. Or it would case mismatch in
> +      * following cgroup events at event_filter_match()
> +      */

This is utterly incomprehensible, what?

> +     if (add && !cpuctx->cgrp &&
> +                     cgroup_is_descendant(cgrp->css.cgroup,
> +                     event->cgrp->css.cgroup)) {
> +             cpuctx->cgrp = cgrp;
> +     }

And that's just horrible coding style. Maybe something like:

        if (add && cgroup_is_descendant(cgrp->css.cgroup, 
event->cgrp->css.cgroup)) {
                if (cpuctx->cgrp)
                        WARN_ON_ONCE(cpuctx->cgrp != cgrp);
                cpuctx->cgrp = cgrp;
        }

that? But that still needs a comment to explain _why_ we do that here.
Under what condition would we fail to have cpuctx->cgrp set while
ctx->nr_cgroups. Your comment doesn't explain nor does your Changelog.

> +
> +     if (add && ctx->nr_cgroups++)
> +             return;
> +     else if (!add && --ctx->nr_cgroups)
> +             return;
>  
> +     /* no cgroup running */
> +     if (!add)
> +             cpuctx->cgrp = NULL;
> +
> +     cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
> +     if (add)
>               list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
> +     else
>               list_del(cpuctx_entry);
>  }
>  
>  #else /* !CONFIG_CGROUP_PERF */
> @@ -2311,6 +2325,20 @@ static int  __perf_install_in_context(void *info)
>               raw_spin_lock(&task_ctx->lock);
>       }
>  
> +#ifdef CONFIG_CGROUP_PERF
> +     if (is_cgroup_event(event)) {
> +             /*
> +              * Only care about cgroup events.
> +              *

That bit is entirely spurious, if it right after if (is_cgroup_event()),
obviously this block is only for cgroup events.

> +              * If only the task belongs to cgroup of this event,
> +              * we will continue the installment

And that isn't really english. I think you meant to write something
like:

                /*
                 * If the current cgroup doesn't match the event's
                 * cgroup, we should not try to schedule it.
                 */

> +              */
> +             struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
> +             reprogram = cgroup_is_descendant(cgrp->css.cgroup,
> +                                     event->cgrp->css.cgroup);
> +     }
> +#endif
> +
>       if (reprogram) {
>               ctx_sched_out(ctx, cpuctx, EVENT_TIME);
>               add_event_to_ctx(event, ctx);
> -- 
> 2.8.4.31.g9ed660f
> 

Reply via email to