On Mon, Nov 17, 2025 at 1:18 AM Christian Loehle <[email protected]> wrote: > > On 11/14/25 05:11, Viresh Kumar wrote: > > On 13-11-25, 19:41, Samuel Wu wrote: > >> On Wed, Nov 12, 2025 at 10:45 PM Viresh Kumar <[email protected]> > >> wrote: > >>> > >>> On 12-11-25, 15:51, Samuel Wu wrote: > >>>> The existing cpu_frequency trace_event can be verbose, emitting an event > >>>> for every CPU in the policy even when their frequencies are identical. > >>>> > >>>> This patch adds a new policy_frequency trace event, which provides a > >>>> more efficient alternative to cpu_frequency trace event. This option > >>>> allows users who only need frequency at a policy level more concise logs > >>>> with simpler analysis. > >>>> > >>>> Signed-off-by: Samuel Wu <[email protected]> > >>>> --- > >>>> drivers/cpufreq/cpufreq.c | 2 ++ > >>>> include/trace/events/power.h | 21 +++++++++++++++++++++ > >>>> 2 files changed, 23 insertions(+) > >>>> > >>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > >>>> index 4472bb1ec83c..b65534a4fd9a 100644 > >>>> --- a/drivers/cpufreq/cpufreq.c > >>>> +++ b/drivers/cpufreq/cpufreq.c > >>>> @@ -345,6 +345,7 @@ static void cpufreq_notify_transition(struct > >>>> cpufreq_policy *policy, > >>>> pr_debug("FREQ: %u - CPUs: %*pbl\n", freqs->new, > >>>> cpumask_pr_args(policy->cpus)); > >>>> > >>>> + trace_policy_frequency(freqs->new, policy->cpu); > >>>> for_each_cpu(cpu, policy->cpus) > >>>> trace_cpu_frequency(freqs->new, cpu); > >>> > >>> I don't see much value in almost duplicate trace events. If we feel that a > >>> per-policy event is a better fit (which makes sens), then we can just > >>> drop the > >>> trace_cpu_frequency() events and print policy->cpus (or related_cpus) > >>> information along with the per-policy events. > >> > >> Thank you for the feedback Viresh. Fair enough, I've done some testing > >> and a single trace event should work and would be cleaner. Please let > >> me know what you think of this proposal for v2. > >> > >> We can append a bitmask of policy->cpus field to > >> trace_cpu_frequency(). This way we maintain backwards compatibility: > >> trace_cpu_frequency() is not removed, and its pre-existing fields are > >> not disturbed. > >> > >> Call flow wise, we can delete all the for_each_cpu() loops, and we > >> still retain the benefits of the trace emitting once per policy > >> instead of once per cpu. > > > > Fine by me. I have added Scheduler maintainers in the loop to see if they > > have a > > different view. > > > > And IIUC your proposal is to fold policy_frequency into cpu_frequency but then > only have one cpu_frequency event per policy emitted?
That's right, emit the trace event once per policy instead of once per cpu- which I think is the most valuable element of this patch. And yes, the latest idea was to append bitmask of policy->cpus into the cpu_frequency event such that relevant policy info is encapsulated in the trace event. > I think from a tooling perspective it would be easier to remove cpu_frequency > entirely, then tools can probe on the presence of policy_frequency / > cpu_frequency. This can be handled perfectly fine by the tools I know of that consume this trace event. The points you and Viresh have brought up are valid, and as this solution is not in conflict with those points, "policy_frequency replacing cpu_frequency" can be the frontrunner for now.
