On Tue, Jul 25, 2017 at 09:47:52PM -0400, Vince Weaver wrote:

> The problem happens with the following:
>       fork a child, put it to sleep with SIGSTOP
>       perf_event_open(), attaching to the pid of the child
>       ENABLE/DISABLE/ENABLE/DISABLE/ENABLE
>       read the results with the enabled/running multiplexing fields
> 
> Your "enabled" time will be a huge negative number, which will confuse any
> code depening on it (especially the rdpmc scaling code).
> 
> From what I can tell, when you DISABLE an event that's in a 
> PERF_EVENT_STATE_INACTIVE state, currently it exits early without
> updating event->tstamp_stopped.  So later when calculating the
> enabled time it is wrong and you get the negative number.
> The following quick hack fixes things, it just abuses the existing code
> that was there to handle a similar case when a filter failed.  The proper
> fix might be to just put the event->tstamp_stopped update before the
> early out code, but the commit log for the change that introduced that
> makes it sound a bit scary.

fa66f07aa1f0 ("perf_events: Fix time tracking for events with pid != -1 and cpu 
!= -1")

is the commit that introduced it (the scary one just moved the && to the
right place).

> Anyway a reproducible test case for this is in my pref_event_tests tree,
> the "tests/rdpmc/rdpmc_attach_multi_enable" test.
> 
> Vince
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 426c2ffba16d..18eadc3b37d5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1812,8 +1812,7 @@ event_sched_out(struct perf_event *event,
>        * maintained, otherwise bogus information is return
>        * via read() for time_enabled, time_running:
>        */
> -     if (event->state == PERF_EVENT_STATE_INACTIVE &&
> -         !event_filter_match(event)) {
> +     if (event->state == PERF_EVENT_STATE_INACTIVE) {
>               delta = tstamp - event->tstamp_stopped;
>               event->tstamp_running += delta;
>               event->tstamp_stopped = tstamp;

It doesn't look entirely crazy.. just trying to understand why Stephane
added that filter clause.

Argh, that time accounting always hurts my brain, its so weird. I never
dared rewrite it for risk of breaking it all again, but maybe, just
maybe we should have.. :/


init(now):
 enabled = stopped = running = now;

in(now):
 running += now - stopped;

out(now):
 stopped = now;

read:
 *enabled = stopped - enabled;
 *running = stopped - running;


Simple example:

init(10):
 e = s = r = 10

in(11):
 r += 11 - 10; r=11

out(12):
 s = 12

read:
 *e = 12-10 = 2
 *r = 12-11 = 1

in(14):
 r += 14 - 12; r=13

out(16):
 s = 16

read:
 *e = 16 - 10 = 6
 *r = 16 - 13 = 3


which confusingly works :-)

So lets break that by out not updating stopped, then we get:


init(10):
 e = s = r = 10

in(11):
 r += 11 - 10; r=11

read:
 *e = 10-10 = 2
 *r = 10-11 = -1

in(14):
 r += 14 - 10; r=15

read:
 *e = 10 - 10 = 0
 *r = 10 - 15 = -5


That is negative for 'running', not enabled. But your test really does
report a negative 'enabled'.

Playing with that test it really is the IOC_DISABLE while STOP'ed that
messes things up.

Ah.. magic.. the below seems to fix things, hopefully it doesn't break
anything else.

---
 kernel/events/core.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ffba16d..5dbbd1f90b4f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2458,6 +2458,20 @@ perf_install_in_context(struct perf_event_context *ctx,
 }
 
 /*
+ * reverse update_event_times()
+ */
+void __perf_event_enable_time(struct perf_event *event, u64 tstamp)
+{
+       /*
+        * total_time_enabled = tstamp_stopped - tstamp_enabled
+        * total_time_running = tstamp_stopped - tstamp_running;
+        */
+       event->tstamp_stopped = tstamp;
+       event->tstamp_enabled = tstamp - event->total_time_enabled;
+       event->tstamp_running = tstamp - event->total_time_running;
+}
+
+/*
  * Put a event into inactive state and update time fields.
  * Enabling the leader of a group effectively enables all
  * the group members that aren't explicitly disabled, so we
@@ -2471,10 +2485,11 @@ static void __perf_event_mark_enabled(struct perf_event 
*event)
        u64 tstamp = perf_event_time(event);
 
        event->state = PERF_EVENT_STATE_INACTIVE;
-       event->tstamp_enabled = tstamp - event->total_time_enabled;
+       __perf_event_enable_time(event, tstamp);
+
        list_for_each_entry(sub, &event->sibling_list, group_entry) {
                if (sub->state >= PERF_EVENT_STATE_INACTIVE)
-                       sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+                       __perf_event_enable_time(sub, tstamp);
        }
 }
 

Reply via email to