On 10/01/2013 11:44 PM, Srivatsa S. Bhat wrote:
> On 10/01/2013 11:06 PM, Peter Zijlstra wrote:
>> On Tue, Oct 01, 2013 at 10:41:15PM +0530, Srivatsa S. Bhat wrote:
>>> However, as Oleg said, its definitely worth considering whether this 
>>> proposed
>>> change in semantics is going to hurt us in the future. CPU_POST_DEAD has 
>>> certainly
>>> proved to be very useful in certain challenging situations (commit 
>>> 1aee40ac9c
>>> explains one such example), so IMHO we should be very careful not to 
>>> undermine
>>> its utility.
>>
>> Urgh.. crazy things. I've always understood POST_DEAD to mean 'will be
>> called at some time after the unplug' with no further guarantees. And my
>> patch preserves that.
>>
>> Its not at all clear to me why cpufreq needs more; 1aee40ac9c certainly
>> doesn't explain it.
>>
> 
> Sorry if I was unclear - I didn't mean to say that cpufreq needs more 
> guarantees
> than that. I was just saying that the cpufreq code would need certain 
> additional
> changes/restructuring to accommodate the change in the semantics brought about
> by this patch. IOW, it won't work as it is, but it can certainly be fixed.
> 


Ok, so I thought a bit more about the changes you are proposing, and I agree
that they would be beneficial in the long run, especially given that it can
eventually lead to a more stream-lined hotplug process where different CPUs
can be hotplugged independently without waiting on each other, like you
mentioned in your other mail. So I'm fine with the new POST_DEAD guarantees
you are proposing - that they are run after unplug, and will be completed
before UP_PREPARE of the same CPU. And its also very convenient that we need
to fix only cpufreq to accommodate this change.

So below is a quick untested patch that modifies the cpufreq hotplug
callbacks appropriately. With this, cpufreq should be able to handle the
POST_DEAD changes, irrespective of whether we do that in the regular path
or in the suspend/resume path. (Because, I've restructured it in such a way
that the races that I had mentioned earlier are totally avoided. That is,
the POST_DEAD handler now performs only the bare-minimal final cleanup, which
doesn't race with or depend on anything else).



diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 04548f7..0a33c1a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1165,7 +1165,7 @@ static int __cpufreq_remove_dev_prepare(struct device 
*dev,
                                        bool frozen)
 {
        unsigned int cpu = dev->id, cpus;
-       int new_cpu, ret;
+       int new_cpu, ret = 0;
        unsigned long flags;
        struct cpufreq_policy *policy;
 
@@ -1200,9 +1200,10 @@ static int __cpufreq_remove_dev_prepare(struct device 
*dev,
                        policy->governor->name, CPUFREQ_NAME_LEN);
 #endif
 
-       lock_policy_rwsem_read(cpu);
+       lock_policy_rwsem_write(cpu);
        cpus = cpumask_weight(policy->cpus);
-       unlock_policy_rwsem_read(cpu);
+       cpumask_clear_cpu(cpu, policy->cpus);
+       unlock_policy_rwsem_write(cpu);
 
        if (cpu != policy->cpu) {
                if (!frozen)
@@ -1220,7 +1221,23 @@ static int __cpufreq_remove_dev_prepare(struct device 
*dev,
                }
        }
 
-       return 0;
+       /* If no target, nothing more to do */
+       if (!cpufreq_driver->target)
+               return 0;
+
+       /* If cpu is last user of policy, cleanup the policy governor */
+       if (cpus == 1) {
+               ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
+               if (ret)
+                       pr_err("%s: Failed to exit governor\n", __func__);
+       } else {
+               if ((ret = __cpufreq_governor(policy, CPUFREQ_GOV_START)) ||
+                               (ret = __cpufreq_governor(policy, 
CPUFREQ_GOV_LIMITS))) {
+                       pr_err("%s: Failed to start governor\n", __func__);
+               }
+       }
+
+       return ret;
 }
 
 static int __cpufreq_remove_dev_finish(struct device *dev,
@@ -1243,25 +1260,12 @@ static int __cpufreq_remove_dev_finish(struct device 
*dev,
                return -EINVAL;
        }
 
-       WARN_ON(lock_policy_rwsem_write(cpu));
+       WARN_ON(lock_policy_rwsem_read(cpu));
        cpus = cpumask_weight(policy->cpus);
-
-       if (cpus > 1)
-               cpumask_clear_cpu(cpu, policy->cpus);
-       unlock_policy_rwsem_write(cpu);
+       unlock_policy_rwsem_read(cpu);
 
        /* If cpu is last user of policy, free policy */
-       if (cpus == 1) {
-               if (cpufreq_driver->target) {
-                       ret = __cpufreq_governor(policy,
-                                       CPUFREQ_GOV_POLICY_EXIT);
-                       if (ret) {
-                               pr_err("%s: Failed to exit governor\n",
-                                               __func__);
-                               return ret;
-                       }
-               }
-
+       if (cpus == 0) {
                if (!frozen) {
                        lock_policy_rwsem_read(cpu);
                        kobj = &policy->kobj;
@@ -1294,15 +1298,6 @@ static int __cpufreq_remove_dev_finish(struct device 
*dev,
 
                if (!frozen)
                        cpufreq_policy_free(policy);
-       } else {
-               if (cpufreq_driver->target) {
-                       if ((ret = __cpufreq_governor(policy, 
CPUFREQ_GOV_START)) ||
-                                       (ret = __cpufreq_governor(policy, 
CPUFREQ_GOV_LIMITS))) {
-                               pr_err("%s: Failed to start governor\n",
-                                               __func__);
-                               return ret;
-                       }
-               }
        }
 
        per_cpu(cpufreq_cpu_data, cpu) = NULL;



Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to