> On Jun 18, 2020, at 2:09 AM, Athira Rajeev <[email protected]> > wrote: > > > >> On 02-May-2020, at 5:51 AM, Song Liu <[email protected]> wrote: >> >> This patch tries to enable PMU sharing. When multiple perf_events are >> counting the same metric, they can share the hardware PMU counter. We >> call these events as "compatible events". >> >> The PMU sharing are limited to events within the same perf_event_context >> (ctx). When a event is installed or enabled, search the ctx for compatible >> events. This is implemented in perf_event_setup_dup(). One of these >> compatible events are picked as the master (stored in event->dup_master). >> Similarly, when the event is removed or disabled, perf_event_remove_dup() >> is used to clean up sharing. >> >> If the master event is a cgroup event or a event in a group, it is >> possible that some slave events are ACTIVE, but the master event is not. >> To handle this scenario, we introduced PERF_EVENT_STATE_ENABLED. Also, >> since PMU drivers write into event->count, master event needs another >> variable (master_count) for the reading of this event. >> >> On the critical paths (add, del read), sharing PMU counters doesn't >> increase the complexity. Helper functions event_pmu_[add|del|read]() are >> introduced to cover these cases. All these functions have O(1) time >> complexity. >> >> Cc: Peter Zijlstra <[email protected]> >> Cc: Arnaldo Carvalho de Melo <[email protected]> >> Cc: Jiri Olsa <[email protected]> >> Cc: Alexey Budankov <[email protected]> >> Cc: Namhyung Kim <[email protected]> >> Cc: Tejun Heo <[email protected]> >> Reported-by: kernel test robot <[email protected]> >> Signed-off-by: Song Liu <[email protected]> >> >> --- >> Changes in v13: >> Fix memory ordering in perf_event_exit_dup_master and perf_event_count. >> Remove test code in perf_event_can_share(). >> >> Changes in v12: >> Add perf_event->dup_active_list to fix rdpmc. (kernel test robot) >> Fix event reset. (kernel test robot) >> Do not consider sharing when the perf event is disabled. >> >> Changes in v11: >> Fold in major fixes by Peter. >> In perf_event_remove_dup(), del() old master before add() new master. >> >> Changes in v10: >> Simplify logic that calls perf_event_setup_dup() and >> perf_event_remove_dup(). (Peter) >> Other small fixes. (Peter) >> >> Changes in v9: >> Avoid ctx_resched() on remove/disable event (Peter). >> Compare the whole perf_event_attr in perf_event_compatible(). >> Small fixes/improvements (Peter). >> >> Changes in v8: >> Fix issues with task event (Jiri). >> Fix issues with event inherit. >> Fix mmap'ed events, i.e. perf test 4 (kernel test bot). >> >> Changes in v7: >> Major rewrite to avoid allocating extra master event. >> (cherry picked from commit 8b98afc0ec100b017a5e80c184809939d4f01f8d) >> >> fix perf_event_count() >> --- >> arch/x86/events/core.c | 7 + >> include/linux/perf_event.h | 22 +- >> kernel/events/core.c | 422 ++++++++++++++++++++++++++++++++++--- >> 3 files changed, 416 insertions(+), 35 deletions(-) >> >> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c >> index 9e63ee50b19a..a245cac31a19 100644 >> --- a/arch/x86/events/core.c >> +++ b/arch/x86/events/core.c >> @@ -2345,6 +2345,11 @@ static int x86_pmu_aux_output_match(struct perf_event >> *event) >> return 0; >> } >> >> +static void x86_copy_hw_config(struct perf_event *old, struct perf_event >> *new) >> +{ >> + new->hw.idx = old->hw.idx; >> +} >> + >> static struct pmu pmu = { >> .pmu_enable = x86_pmu_enable, >> .pmu_disable = x86_pmu_disable, >> @@ -2373,6 +2378,8 @@ static struct pmu pmu = { >> .check_period = x86_pmu_check_period, >> >> .aux_output_match = x86_pmu_aux_output_match, >> + >> + .copy_hw_config = x86_copy_hw_config, >> }; >> >> void arch_perf_update_userpage(struct perf_event *event, >> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h >> index 87e21681759c..1306fbd920fd 100644 >> --- a/include/linux/perf_event.h >> +++ b/include/linux/perf_event.h >> @@ -503,6 +503,13 @@ struct pmu { >> * Check period value for PERF_EVENT_IOC_PERIOD ioctl. >> */ >> int (*check_period) (struct perf_event *event, u64 value); >> /* optional */ >> + >> + /* >> + * Copy hw configuration from one event to another. This is used >> + * to make switching master faster in PMC sharing. >> + */ >> + void (*copy_hw_config) (struct perf_event *old, >> + struct perf_event *new); /* optional */ >> }; >> >> enum perf_addr_filter_action_t { >> @@ -552,6 +559,10 @@ struct perf_addr_filter_range { >> >> /** >> * enum perf_event_state - the states of an event: >> + * >> + * PERF_EVENT_STATE_ENABLED: Special state for PMC sharing: the hw >> PMC >> + * is enabled, but this event is not counting. >> + * See perf_event_init_dup_master(). >> */ >> enum perf_event_state { >> PERF_EVENT_STATE_DEAD = -4, >> @@ -559,7 +570,8 @@ enum perf_event_state { >> PERF_EVENT_STATE_ERROR = -2, >> PERF_EVENT_STATE_OFF = -1, >> PERF_EVENT_STATE_INACTIVE = 0, >> - PERF_EVENT_STATE_ACTIVE = 1, >> + PERF_EVENT_STATE_ENABLED = 1, >> + PERF_EVENT_STATE_ACTIVE = 2, >> }; >> >> struct file; >> @@ -645,6 +657,7 @@ struct perf_event { >> int group_caps; >> >> struct perf_event *group_leader; >> + struct perf_event *dup_master; /* for PMU sharing */ >> struct pmu *pmu; >> void *pmu_private; >> >> @@ -762,6 +775,13 @@ struct perf_event { >> void *security; >> #endif >> struct list_head sb_list; >> + >> + int dup_active; >> + /* See event_pmu_read_dup() */ >> + local64_t dup_count; >> + /* See perf_event_init_dup_master() */ >> + local64_t master_count; >> + struct list_head dup_active_list; >> #endif /* CONFIG_PERF_EVENTS */ >> }; >> >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index 206dd29c3c6c..40c8b15407b7 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -1002,7 +1002,6 @@ perf_cgroup_event_enable(struct perf_event *event, >> struct perf_event_context *ct >> */ >> cpuctx = container_of(ctx, struct perf_cpu_context, ctx); >> >> - pr_info("%s cpu %d\n", __func__, smp_processor_id()); >> /* >> * Since setting cpuctx->cgrp is conditional on the current @cgrp >> * matching the event's cgroup, we must do this for every new event, >> @@ -1012,8 +1011,6 @@ perf_cgroup_event_enable(struct perf_event *event, >> struct perf_event_context *ct >> if (ctx->is_active && !cpuctx->cgrp) { >> struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx); >> >> - pr_info("%s cpu %d cgrp %px\n", __func__, smp_processor_id(), >> - cgrp); > > Hi Song Liu, > > I had problem in trying to apply this patch on latest upstream. Example, the > `pr_info` in perf_cgroup_event_enable. Looks like there were some local > changes on top of which this patch was created. Could you please confirm to > make sure there are no other changes missing. Sorry for the confusion. You are right that those two pr_info() are from a local commit. I pushed latest version to https://git.kernel.org/pub/scm/linux/kernel/git/song/linux.git/log/?h=pmu-share Please let me know if you have problem with it. Thanks, Song

