On Fri, 2018-06-22 at 15:05 -0700, Andy Lutomirski wrote: > I think the right solution if you want that last little bit of > performance is to get rid of the code in intel_idle and to add it in > the core idle code. We have fancy scheduler code to estimate the > idle > time, and we should use it here IMO.
Good point. However, I suspect we have some lower hanging larger fruit to tackle first. Every time we go into lazy TLB mode, we take a refcount on the mm. Every time we leave lazy TLB mode, we drop the refcount. Every time we switch from the idle task to a kernel thread, the kernel thread takes a refcount, and the idle task drops it. Every time we switch back, we do the same dance in reverse. I am working on a patch to grab the refcount once, and hang onto it while a particular mm is that CPU's lazy_mm. We can release it when we switch to a task with a different mm. The patches we have so far get rid of a lot of the pounding on mm_cpumask(mm). That patch should help us also get rid of tasks pounding on mm->count. After that, the idle state thing is probably of pretty small impact, though I suspect it will still be worth tackling :) As an aside, isn't the fancy CPU power management stuff in the scheduler cpufreq, not cpuidle? The cpuidle stuff in kernel/sched/idle.c looks like it will just call down into the menu governor (and maybe the ladder governor on some systems??) -- All Rights Reversed.
signature.asc
Description: This is a digitally signed message part

