On Fri, Oct 02, 2020 at 10:03:17PM +0900, Namhyung Kim wrote:
> Currently perf inject just repipes the event without any flush.  It
> makes an issue that it changes the order of events processed.
> 
> Normally we want to process events in time order, but events are
> recorded from several cpus and they weren't sorted with each other.
> So we put them into the ordered event queue, sorted by time, and flush
> them when we see a next FINISHED_ROUND event.
> 
> But this is for events from kernel, user space events (like the
> FINISHED_ROUND) are processed without queueing.  So during the perf
> inject, it writes all the FINISHED_ROUND events directly while
> SAMPLE (and other) events are kept in the queue.  This makes the user
> space events are put before any kernel events.
> 
> You can see that with the following command:
> 
>   $ perf record -a -o- sleep 1 | perf inject -b | perf script -i- 
> --show-round-events
>   PERF_RECORD_FINISHED_ROUND
>   PERF_RECORD_FINISHED_ROUND
>   PERF_RECORD_FINISHED_ROUND
>   ...
> 
> Omitting perf inject, you can see the events are placed in the middle
> of the data.
> 
> You might argue that the whole point of the FINISHED_ROUND event is to
> sort (kernel) events.  And after perf inject, all events are written
> in a proper time order so we don't care about the FINISHED_ROUND event
> anymore.
> 
> But the problem is we don't know whether the input data is sorted or
> not (maybe we can add a feature bit for this?) so it should use an
> ordered event queue when processing the input like in perf report.

I like the idea of storing the information that the data is sorted,
and when it's there, let's not use ordered_oevets

> 
> Remember all the FINISHED_ROUND events now come before other events so
> the tool cannot know when it can flush the data so everything will be
> queued until it meets the end of the input.  Actually it's same for
> perf inject itself as it doesn't flush the queue.
> 
> Below measures time and memory usage during the perf inject and
> report using ~190MB data file.
> 
> Before:
>   perf inject:  11.09 s,  382148 KB
>   perf report:   8.05 s,  397440 KB
> 
> After:
>   perf inject:  16.24 s,   83376 KB
>   perf report:   7.96 s,  216184 KB
> 
> As you can see, it used 2x memory of the input size.  I guess it's
> because it needs to keep the copy for the whole input.  But I don't
> understand why processing time of perf inject increased..

would be good to find out first

thanks,
jirka

> 
> I'm not sure how it affects the auxtrace, but it should be fine IMHO.
> 
> Cc: Al Grant <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
>  tools/perf/builtin-inject.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 6d2f410d773a..9fa78a9edfc5 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -79,6 +79,7 @@ static int perf_event__repipe_oe_synth(struct perf_tool 
> *tool,
>                                      union perf_event *event,
>                                      struct ordered_events *oe __maybe_unused)
>  {
> +     ordered_events__flush(oe, OE_FLUSH__ROUND);
>       return perf_event__repipe_synth(tool, event);
>  }
>  
> -- 
> 2.28.0.806.g8561365e88-goog
> 

Reply via email to