Any comment on this patch set? This is an important improvement for any system-wide measurement.
On Thu, Sep 13, 2012 at 4:10 PM, Stephane Eranian <eran...@google.com> wrote: > The current scheme of using the timer tick was fine > for per-thread events. However, it was causing > bias issues in system-wide mode (including for > uncore PMUs). Event groups would not get their > fair share of runtime on the PMU. With tickless > kernels, if a core is idle there is no timer tick, > and thus no event rotation (multiplexing). However, > there are events (especially uncore events) which do > count even though cores are asleep. > > This patch changes the timer source for multiplexing. > It introduces a per-cpu hrtimer. The advantage is that > even when the core goes idle, it will come back to > service the hrtimer, thus multiplexing on system-wide > events works much better. > > In order to minimize the impact of the hrtimer, it > is turned on and off on demand. When the PMU on > a CPU is overcommited, the hrtimer is activated. > It is stopped when the PMU is not overcommitted. > > In order for this to work properly with HOTPLUG_CPU, > we had to change the order of initialization in > start_kernel() such that hrtimer_init() is run > before perf_event_init(). > > The second patch provide a sysctl control to > adjust the multiplexing interval. Unit is > milliseconds. > > Here is a simple before/after example with > two event groups which do require multiplexing. > This is done in system-wide mode on an idle > system. What matters here is the scaling factor > in [] in not the total counts. > > Before: > > # perf stat -a -e ref-cycles,ref-cycles sleep 10 > Performance counter stats for 'sleep 10': > 34,319,545 ref-cycles [56.51%] > 31,917,229 ref-cycles [43.50%] > > 10.000827569 seconds time elapsed > > After: > # perf stat -a -e ref-cycles,ref-cycles sleep 10 > Performance counter stats for 'sleep 10': > 11,144,822,193 ref-cycles [50.00%] > 11,103,760,513 ref-cycles [50.00%] > > 10.000672946 seconds time elapsed > > In this second version of the patchset, we now > have the hrtimer_interval per PMU instance. The > tunable is in /sys/devices/XXX/mux_interval_ms, > where XXX is the name of the PMU instance. Due > to initialization changes of each hrtimer, we > had to introduce hrtimer_init_cpu() to initialize > a hrtimer from another CPU. > > In the 3rd version, we simplify the code a bit > by using hrtimer_active(). We stopped using > the rotation_list for perf_cpu_hrtimer_cancel(). > We also fix an intialization problem. > > Signed-off-by: Stephane Eranian <eran...@google.com> > --- > > Stephane Eranian (3): > hrtimer: add hrtimer_init_cpu() > perf: use hrtimer for event multiplexing > perf: add sysfs entry to adjust multiplexing interval per PMU > > include/linux/hrtimer.h | 2 + > include/linux/perf_event.h | 5 +- > init/main.c | 2 +- > kernel/events/core.c | 166 > +++++++++++++++++++++++++++++++++++++++++--- > kernel/hrtimer.c | 17 +++-- > 5 files changed, 176 insertions(+), 16 deletions(-) > > -- > 1.7.5.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/