> On May 28, 2018, at 4:15 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> 
> On Fri, May 04, 2018 at 04:11:02PM -0700, Song Liu wrote:
>> Connection among perf_event and perf_event_dup are built with function
>> rebuild_event_dup_list(cpuctx). This function is only called when events
>> are added/removed or when a task is scheduled in/out. So it is not on
>> critical path of perf_rotate_context().
> 
> Why is perf_rotate_context() the only critical path? I would say the
> context switch path is rather critical too.
> 
>> @@ -2919,8 +3014,10 @@ static void ctx_sched_out(struct perf_event_context 
>> *ctx,
>> 
>>      if (ctx->task) {
>>              WARN_ON_ONCE(cpuctx->task_ctx != ctx);
>> -            if (!ctx->is_active)
>> +            if (!ctx->is_active) {
>>                      cpuctx->task_ctx = NULL;
>> +                    rebuild_event_dup_list(cpuctx);
>> +            }
>>      }
>> 
>>      /*
> 
>> +static void rebuild_event_dup_list(struct perf_cpu_context *cpuctx)
>> +{
>> +    int dup_count = cpuctx->ctx.nr_events;
>> +    struct perf_event_context *ctx = cpuctx->task_ctx;
>> +    struct sched_in_data sid = {
>> +            .ctx = ctx,
>> +            .cpuctx = cpuctx,
>> +            .can_add_hw = 1,
>> +    };
>> +
>> +    if (ctx)
>> +            dup_count += ctx->nr_events;
>> +
>> +    kfree(cpuctx->dup_event_list);
>> +    cpuctx->dup_event_count = 0;
>> +
>> +    cpuctx->dup_event_list =
>> +            kzalloc(sizeof(struct perf_event_dup) * dup_count, GFP_ATOMIC);
> 
> 
> __schedule()
>  local_irq_disable()
>  raw_spin_lock(rq->lock)
>  context_switch()
>    prepare_task_switch()
>      perf_event_task_sched_out()
>        __perf_event_task_sched_out()
>         perf_event_context_sched_out()
>           task_ctx_sched_out()
>             ctx_sched_out()
>               rebuild_event_dup_list()
>                 kzalloc()
>                   ...
>                     spin_lock()
> 
> Also, as per the above, this nests a regular spin lock inside the
> (raw) rq->lock, which is a no-no.
> 
> Not to mention that whole O(n) crud in the scheduling path...

I think we can also fix the scheduling path. To achieve this, we need
to limit the sharing within the ctx. In other words, events in 
cpuctx->ctx can only share PMU with events in cpuctx->ctx, but not 
with events in cpuctx->task_ctx. This will probably also solve the
locking issue here. Let me try it. 

Thanks,
Song


Reply via email to