Hi Viresh, On Fri, Aug 28, 2020 at 11:57:28AM +0200, Stephan Gerhold wrote: > On Fri, Aug 28, 2020 at 12:05:11PM +0530, Viresh Kumar wrote: > > On 27-08-20, 13:44, Stephan Gerhold wrote: > > > Hmm. Actually I was using this parameter for initial testing, and forced > > > on the power domains from the qcom-cpufreq-nvmem driver. For my v1 patch > > > I wanted to enable the power domains in dev_pm_opp_set_rate(), so there > > > using the virt_devs parameter was not possible. > > > > Right, as we really do not want to enable it there and leave it for > > the real consumers to handle. > > > > > On the other hand, creating the device links would be possible from the > > > platform driver by using the parameter. > > > > Right. > > > > > > And so I think again if this patch should be picked instead of letting > > > > the platform handle this ? > > > > > > It seems like originally the motivation for the parameter was that > > > cpufreq consumers do *not* need to power on the power domains: > > > > > > Commit 17a8f868ae3e ("opp: Return genpd virtual devices from > > > dev_pm_opp_attach_genpd()"): > > > "The cpufreq drivers don't need to do runtime PM operations on > > > the virtual devices returned by dev_pm_domain_attach_by_name() and so > > > the virtual devices weren't shared with the callers of > > > dev_pm_opp_attach_genpd() earlier. > > > > > > But the IO device drivers would want to do that. This patch updates > > > the prototype of dev_pm_opp_attach_genpd() to accept another argument > > > to return the pointer to the array of genpd virtual devices." > > > > Not just that I believe. There were also arguments that only the real > > consumer knows how to handle multiple power domains. For example for a > > USB or Camera module which can work in multiple modes, we may want to > > enable only one power domain in say slow mode and another power domain > > in fast mode. And so these kind of complex behavior/choices better be > > left for the end consumer and not try to handle this generically in > > the OPP core. > > > [...] > > It seems to me that there is more work needed to make such a use case > really work, but it's hard to speculate without a real example. >
So it seems like we have a real example now. :) As mentioned in my other mail [1] it turns out I actually have such a use case. I briefly explained it in [2], basically the clock that provides higher CPU frequencies has some voltage requirements that should be voted for using a power domain. The clock that provides the lower CPU frequencies has no such requirement, so I need to scale (and power on) a power domain only for some of the OPPs. [1]: https://lore.kernel.org/linux-pm/20200831154938.ga33...@gerhold.net/ [2]: https://lore.kernel.org/linux-arm-msm/20200910162610.ga7...@gerhold.net/ So I think it would be good to discuss this use case first before we decide on this patch (how to enable power domains managed by the OPP core). I think there are two problems that need to be solved: 1. How can we drop our performance state votes for some of the OPPs? I explained that problem earlier already: > > I was thinking about something like that, but can you actually drop > your performance state vote for one of the power domains using > "required-opps"? > > At the moment it does not seem possible. I tried adding a special OPP > using opp-level = <0> to reference that from required-opps, but the OPP > core does not allow this: > > vddcx: Not all nodes have performance state set (7: 6) > vddcx: Failed to add OPP table for index 0: -2 > > So the "virt_devs" parameter would allow you to disable the power > domain, but not to drop your performance state vote (you could only vote > for the lowest OPP, not 0...) Not sure if it makes sense but I think somehow allowing the additional opp-level = <0> would be a simple solution. For example: rpmpd: power-controller { rpmpd_opp_table: opp-table { compatible = "operating-points-v2"; /* * This one can be referenced to drop the performance state * vote within required-opps. */ rpmpd_opp_none: opp0 { opp-level = <0>; }; rpmpd_opp_retention: opp1 { opp-level = <1>; }; /* ... */ }; }; cpu_opp_table: cpu-opp-table { compatible = "operating-points-v2"; opp-shared; /* Power domain is only needed for frequencies >= 998 MHz */ opp-200000000 { opp-hz = /bits/ 64 <200000000>; required-opps = <&rpmpd_opp_none>; /* = drop perf state */ }; opp-998400000 { opp-hz = /bits/ 64 <998400000>; required-opps = <&rpmpd_opp_svs_soc>; }; opp-1209600000 { opp-hz = /bits/ 64 <1209600000>; required-opps = <&rpmpd_opp_nominal>; }; }; 2. Where/when to enable the power domains: I need to enable the power domain whenever I have a vote for a performance state. In the example above the power domain should get enabled for >= 998 MHz and disabled otherwise. At least for the CPUFreq case the "virt_devs" parameter does not really help in this case... dev_pm_opp_set_rate() is called within cpufreq-dt which is supposed to be generic. So I can't enable the power domains myself from there. Even if I made a custom cpufreq driver that has control over the dev_pm_opp_set_rate() call - I don't really know exactly in the driver for which frequencies a power domain is needed. On the other hand, the OPP core does have that information. This brings me back to my PATCH v1 (where I used runtime PM functions instead of device links). If I modify it to enable the power domain whenever we have a performance state vote > 0 when setting an OPP, it would do exactly what I need... I don't think it makes sense to do performance state votes without enabling a power domain, so this approach sounds good to me... What do you think? Thanks! Stephan