On Wed, May 23, 2018 at 12:09 AM, Joel Fernandes <j...@joelfernandes.org> wrote: > On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote: >> Okay, me and Rafael were discussing this patch, locking and races around >> this. >> >> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote: >> > diff --git a/kernel/sched/cpufreq_schedutil.c >> > b/kernel/sched/cpufreq_schedutil.c >> > index e13df951aca7..5c482ec38610 100644 >> > --- a/kernel/sched/cpufreq_schedutil.c >> > +++ b/kernel/sched/cpufreq_schedutil.c >> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy >> > *sg_policy, u64 time) >> > !cpufreq_can_do_remote_dvfs(sg_policy->policy)) >> > return false; >> > >> > - if (sg_policy->work_in_progress) >> > - return false; >> > - >> > if (unlikely(sg_policy->need_freq_update)) { >> > sg_policy->need_freq_update = false; >> > /* >> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy >> > *sg_policy, u64 time, >> > >> > policy->cur = next_freq; >> > trace_cpu_frequency(next_freq, smp_processor_id()); >> > - } else { >> > + } else if (!sg_policy->work_in_progress) { >> > sg_policy->work_in_progress = true; >> > irq_work_queue(&sg_policy->irq_work); >> > } >> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct >> > update_util_data *hook, u64 time, >> > >> > ignore_dl_rate_limit(sg_cpu, sg_policy); >> > >> > + /* >> > + * For slow-switch systems, single policy requests can't run at the >> > + * moment if update is in progress, unless we acquire update_lock. >> > + */ >> > + if (sg_policy->work_in_progress) >> > + return; >> > + >> > if (!sugov_should_update_freq(sg_policy, time)) >> > return; >> > >> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, >> > u64 time, unsigned int flags) >> > static void sugov_work(struct kthread_work *work) >> > { >> > struct sugov_policy *sg_policy = container_of(work, struct >> > sugov_policy, work); >> > + unsigned int freq; >> > + unsigned long flags; >> > + >> > + /* >> > + * Hold sg_policy->update_lock shortly to handle the case where: >> > + * incase sg_policy->next_freq is read here, and then updated by >> > + * sugov_update_shared just before work_in_progress is set to false >> > + * here, we may miss queueing the new update. >> > + * >> > + * Note: If a work was queued after the update_lock is released, >> > + * sugov_work will just be called again by kthread_work code; and the >> > + * request will be proceed before the sugov thread sleeps. >> > + */ >> > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags); >> > + freq = sg_policy->next_freq; >> > + sg_policy->work_in_progress = false; >> > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags); >> > >> > mutex_lock(&sg_policy->work_lock); >> > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, >> > - CPUFREQ_RELATION_L); >> > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L); >> > mutex_unlock(&sg_policy->work_lock); >> > - >> > - sg_policy->work_in_progress = false; >> > } >> >> And I do see a race here for single policy systems doing slow switching. >> >> Kthread Sched update >> >> sugov_work() sugov_update_single() >> >> lock(); >> // The CPU is free to rearrange below >> // two in any order, so it may clear >> // the flag first and then read next >> // freq. Lets assume it does. >> work_in_progress = false >> >> if (work_in_progress) >> return; >> >> sg_policy->next_freq >> = 0; >> freq = sg_policy->next_freq; >> sg_policy->next_freq >> = real-next-freq; >> unlock(); >> > > I agree with the race you describe for single policy slow-switch. Good find :) > > The mainline sugov_work could also do such reordering in sugov_work, I think. > Even > with the mutex_unlock in mainline's sugov_work, that work_in_progress write > could > be reordered by the CPU to happen before the read of next_freq. AIUI, > mutex_unlock is expected to be only a release-barrier. > > Although to be safe, I could just put an smp_mb() there. I believe with that, > no locking would be needed for such case.
Yes, but leaving the work_in_progress check in sugov_update_single() means that the original problem is still there in the one-CPU policy case. Namely, utilization updates coming in between setting work_in_progress in sugov_update_commit() and clearing it in sugov_work() will be discarded in the one-CPU policy case, but not in the shared policy case. > I'll send out a v3 with Acks for the original patch, OK > and the send out the smp_mb() as a separate patch if that's Ok. I would prefer to use a spinlock in the one-CPU policy non-fast-switch case and remove the work_in_progress check from sugov_update_single(). I can do a patch on top of yours for that. In fact, I've done that already. :-) Thanks, Rafael