Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
On 25/02/21 09:05, Vincent Guittot wrote: >> One last thing for patch 7: mayhaps we could do a tad better to avoid >> duplicate updates going through a heapful of leaf cfs rqs, see >> >> http://lore.kernel.org/r/jhj4kiht7oh.mog...@arm.com > > rq->last_blocked_load_update_tick is there only to filter duplicate > update during _nohz_idle_balance but not for other normal LB. > Right, update_nohz_stats() is now only used in _nohz_idle_balance()... IIUC the pattern being covered here would be a CPUX getting kicked for nohz stats/balance, then some CPUX-N (so it gets picked by find_new_ilb()) entering idle and getting kick in turn (less than a jiffy apart).
Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
Hi Valentin, On Wed, 24 Feb 2021 at 19:46, Valentin Schneider wrote: > > On 24/02/21 14:30, Vincent Guittot wrote: > > Joel reported long preempt and irq off sequence in newidle_balance because > > of a large number of CPU cgroups in use and having to be updated. This > > patchset moves the update outside newidle_imblance. This enables to early > > abort during the updates in case of pending irq as an example. > > > > Instead of kicking a normal ILB that will wakes up CPU which is already > > idle, patch 6 triggers the update of statistics in the idle thread of > > the CPU before selecting and entering an idle state. > > > > Changes on v4: > > - Add a dedicated bit for updating blocked load when entering idle. > > This simplifies the management of concurrency with kick_ilb. > > > > I believe that solves the issues vs nohz balance. > > One last thing for patch 7: mayhaps we could do a tad better to avoid > duplicate updates going through a heapful of leaf cfs rqs, see > > http://lore.kernel.org/r/jhj4kiht7oh.mog...@arm.com rq->last_blocked_load_update_tick is there only to filter duplicate update during _nohz_idle_balance but not for other normal LB. > > > Otherwise, feel free to add to the lot: > > Reviewed-by: Valentin Schneider >
Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
On 24/02/21 14:30, Vincent Guittot wrote: > Joel reported long preempt and irq off sequence in newidle_balance because > of a large number of CPU cgroups in use and having to be updated. This > patchset moves the update outside newidle_imblance. This enables to early > abort during the updates in case of pending irq as an example. > > Instead of kicking a normal ILB that will wakes up CPU which is already > idle, patch 6 triggers the update of statistics in the idle thread of > the CPU before selecting and entering an idle state. > > Changes on v4: > - Add a dedicated bit for updating blocked load when entering idle. > This simplifies the management of concurrency with kick_ilb. > I believe that solves the issues vs nohz balance. One last thing for patch 7: mayhaps we could do a tad better to avoid duplicate updates going through a heapful of leaf cfs rqs, see http://lore.kernel.org/r/jhj4kiht7oh.mog...@arm.com Otherwise, feel free to add to the lot: Reviewed-by: Valentin Schneider
Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
On Wed, Feb 24, 2021 at 06:51:01PM +0100, Vincent Guittot wrote: > > OK, shall I add something like: > > > > This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to > > O(nr_cgroups). > > > > To the changelog of patch #1 ? > > Yes, good point. This will clarify the range of improvement OK, done.
Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
On Wed, 24 Feb 2021 at 18:41, Peter Zijlstra wrote: > > On Wed, Feb 24, 2021 at 04:57:15PM +0100, Vincent Guittot wrote: > > On Wed, 24 Feb 2021 at 16:54, Peter Zijlstra wrote: > > > > > > On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote: > > > > Joel reported long preempt and irq off sequence in newidle_balance > > > > because > > > > of a large number of CPU cgroups in use and having to be updated. This > > > > patchset moves the update outside newidle_imblance. This enables to > > > > early > > > > abort during the updates in case of pending irq as an example. > > > > > > > > Instead of kicking a normal ILB that will wakes up CPU which is already > > > > idle, patch 6 triggers the update of statistics in the idle thread of > > > > the CPU before selecting and entering an idle state. > > > > > > I'm confused... update_blocked_averages(), which calls > > > __update_blocked_fair(), which is the one doing the cgroup iteration > > > thing, runs with rq->lock held, and thus will have IRQs disabled any > > > which way around we turn this thing. > > > > > > Or is the problem that we called nohz_idle_balance(), which does > > > update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from > > > newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency? > > > Which is now reduced to just NR_CGROUPS ? > > > > Yes we can now abort between each cpu update > > OK, shall I add something like: > > This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to > O(nr_cgroups). > > To the changelog of patch #1 ? Yes, good point. This will clarify the range of improvement
Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
On Wed, Feb 24, 2021 at 04:57:15PM +0100, Vincent Guittot wrote: > On Wed, 24 Feb 2021 at 16:54, Peter Zijlstra wrote: > > > > On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote: > > > Joel reported long preempt and irq off sequence in newidle_balance because > > > of a large number of CPU cgroups in use and having to be updated. This > > > patchset moves the update outside newidle_imblance. This enables to early > > > abort during the updates in case of pending irq as an example. > > > > > > Instead of kicking a normal ILB that will wakes up CPU which is already > > > idle, patch 6 triggers the update of statistics in the idle thread of > > > the CPU before selecting and entering an idle state. > > > > I'm confused... update_blocked_averages(), which calls > > __update_blocked_fair(), which is the one doing the cgroup iteration > > thing, runs with rq->lock held, and thus will have IRQs disabled any > > which way around we turn this thing. > > > > Or is the problem that we called nohz_idle_balance(), which does > > update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from > > newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency? > > Which is now reduced to just NR_CGROUPS ? > > Yes we can now abort between each cpu update OK, shall I add something like: This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to O(nr_cgroups). To the changelog of patch #1 ?
Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
On Wed, 24 Feb 2021 at 16:54, Peter Zijlstra wrote: > > On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote: > > Joel reported long preempt and irq off sequence in newidle_balance because > > of a large number of CPU cgroups in use and having to be updated. This > > patchset moves the update outside newidle_imblance. This enables to early > > abort during the updates in case of pending irq as an example. > > > > Instead of kicking a normal ILB that will wakes up CPU which is already > > idle, patch 6 triggers the update of statistics in the idle thread of > > the CPU before selecting and entering an idle state. > > I'm confused... update_blocked_averages(), which calls > __update_blocked_fair(), which is the one doing the cgroup iteration > thing, runs with rq->lock held, and thus will have IRQs disabled any > which way around we turn this thing. > > Or is the problem that we called nohz_idle_balance(), which does > update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from > newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency? > Which is now reduced to just NR_CGROUPS ? Yes we can now abort between each cpu update
Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance
On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote: > Joel reported long preempt and irq off sequence in newidle_balance because > of a large number of CPU cgroups in use and having to be updated. This > patchset moves the update outside newidle_imblance. This enables to early > abort during the updates in case of pending irq as an example. > > Instead of kicking a normal ILB that will wakes up CPU which is already > idle, patch 6 triggers the update of statistics in the idle thread of > the CPU before selecting and entering an idle state. I'm confused... update_blocked_averages(), which calls __update_blocked_fair(), which is the one doing the cgroup iteration thing, runs with rq->lock held, and thus will have IRQs disabled any which way around we turn this thing. Or is the problem that we called nohz_idle_balance(), which does update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency? Which is now reduced to just NR_CGROUPS ?
[PATCH 0/7 v4] move update blocked load outside newidle_balance
Joel reported long preempt and irq off sequence in newidle_balance because of a large number of CPU cgroups in use and having to be updated. This patchset moves the update outside newidle_imblance. This enables to early abort during the updates in case of pending irq as an example. Instead of kicking a normal ILB that will wakes up CPU which is already idle, patch 6 triggers the update of statistics in the idle thread of the CPU before selecting and entering an idle state. Changes on v4: - Add a dedicated bit for updating blocked load when entering idle. This simplifies the management of concurrency with kick_ilb. Changes on v3: - Fixed a compilation error for !CONFIG_SMP && CONFIG_NO_HZ_COMMON reported by kernel test robot - Took advantage of this new version to add a short desciption for nohz_run_idle_balance Changes on v2: - Fixed some typos and updated some comments - Added more cleanup - Changed to way to trigger ILB in idle thread context to remove a possible race condition between the normal softirq ILB and this new mecanism. The cpu can already be set in idle_cpus_mask because even if the cpu is added later when entering idle, it might not have been removed yet from previous idle phase. Vincent Guittot (7): sched/fair: remove update of blocked load from newidle_balance sched/fair: remove unused return of _nohz_idle_balance sched/fair: remove unused parameter of update_nohz_stats sched/fair: merge for each idle cpu loop of ILB sched/fair: reorder newidle_balance pulled_task tests sched/fair: trigger the update of blocked load on newly idle cpu sched/fair: reduce the window for duplicated update kernel/sched/core.c | 2 +- kernel/sched/fair.c | 118 +-- kernel/sched/idle.c | 6 +++ kernel/sched/sched.h | 7 +++ 4 files changed, 60 insertions(+), 73 deletions(-) -- 2.17.1