On 2017/7/12 0:34, Peter Zijlstra wrote: > On Tue, Jul 11, 2017 at 06:09:27PM +0200, Frederic Weisbecker wrote: > >>>> - tick_nohz_idle_enter costs 7058ns - 10726ns >>>> - tick_nohz_idle_exit costs 8372ns - 20850ns >>> >>> Right, those are horrible expensive, but skipping them isn't 'hard', the >>> only tricky bit is finding a condition that makes sense. >> >> Note you can statically disable it with nohz=0 boot parameter. > > Yeah, but that's bad for power usage, nobody wants that. > >>> See Mike's patch: https://patchwork.kernel.org/patch/2839221/ >>> >>> Combined with the above, and possibly a better condition, that should >>> get rid of most of this. >> >> Such a patch could work well if the decision from the scheduler to not stop >> the tick >> happens on idle entry. >> >> Now if sched_needs_cpu() first allows to stop the tick then refuses it later >> in the end of an idle IRQ, this won't have the desired effect. As long as >> ts->tick_stopped=1, >> it stays so until we really restart the tick. So the whole costly nohz >> machinery stays on. >> >> I guess it doesn't matter though, as we are talking about making fast idle >> entry so the >> decision not to stop the tick is likely to be done once on idle entry, when >> ts->tick_stopped=0. >> >> One exception though: if the tick is already stopped when we enter idle >> (full nohz case). And >> BTW stopping the tick outside idle shouldn't be concerned here. >> >> So I'd rather put that on can_stop_idle_tick(). > > Mike's patch much predates the existence of that function I think ;-) But > sure.. >
Okay, the difference is that Mike's patch uses a very simple algorithm to make the decision. /* * delta is wakeup_timestamp - idle_timestamp */ update_avg(&rq->avg_idle, delta); ... static void update_avg(u64 *avg, u64 sample) { s64 diff = sample - *avg; *avg += diff >> 3; } While my proposal is trying to leverage the prediction functionality of the existing idle menu governor, which works very well for a long time. I know the the code change is big and the running overhead is a bit higher than rq->avg_idle, but should we make a comparison for some typical workloads? Thanks, -Aubrey