On Mon, Aug 28, 2017 at 09:23:59PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 24, 2017 at 06:27:30PM +0200, Jiri Olsa wrote:
> > Adding leader's state check into perf_output_read_group
> > to ensure we read only leader, which is scheduled in.
> > 
> > Similar check is already there for siblings.
> > 
> > Signed-off-by: Jiri Olsa <jo...@kernel.org>
> > ---
> >  kernel/events/core.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 30e30e94ea32..9a2791afe051 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5760,6 +5760,11 @@ void perf_event__output_id_sample(struct perf_event 
> > *event,
> >             __perf_event__output_id_sample(handle, sample);
> >  }
> >  
> > +static bool can_read(struct perf_event *event)
> > +{
> > +   return event->state == PERF_EVENT_STATE_ACTIVE;
> > +}
> > +
> >  static void perf_output_read_one(struct perf_output_handle *handle,
> >                              struct perf_event *event,
> >                              u64 enabled, u64 running)
> > @@ -5800,7 +5805,7 @@ static void perf_output_read_group(struct 
> > perf_output_handle *handle,
> >     if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
> >             values[n++] = running;
> >  
> > -   if (leader != event)
> > +   if ((leader != event) && can_read(leader))
> >             leader->pmu->read(leader);
> >  
> >     values[n++] = perf_event_count(leader);
> > @@ -5812,8 +5817,7 @@ static void perf_output_read_group(struct 
> > perf_output_handle *handle,
> >     list_for_each_entry(sub, &leader->sibling_list, group_entry) {
> >             n = 0;
> >  
> > -           if ((sub != event) &&
> > -               (sub->state == PERF_EVENT_STATE_ACTIVE))
> > +           if ((sub != event) && can_read(sub))
> >                     sub->pmu->read(sub);
> >  
> >             values[n++] = perf_event_count(sub);
> 
> I'm not seeing how this makes sense. Groups should either _all_ be
> scheduled or not at all. Please explain.

so this could be called for event which is already scheduled out:

  perf_event_exit_task_context
    task_ctx_sched_out <- unschedules event
    perf_event_exit_event
      sync_child_event
        perf_event_read_event
          perf_output_read

if leader != events (which is, if you don't have Mark's fix),
we'll call leader->pmu->read(leader) even if it's not scheduled in

jirka

Reply via email to