Re: [PATCH] sched: fix env->src_cpu for active migration
On 13 February 2013 21:03, Paul Turner wrote: > On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot > wrote: >> need_active_balance uses env->src_cpu which is set only if there is more >> than 1 task on the run queue. >> We must set the src_cpu field unconditionnally >> otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is >> only 1 task on the run queue >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/fair.c |6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 81fa536..32938ea 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -5044,6 +5044,10 @@ redo: >> > > Aside: Oh dear, it looks like in all the re-factoring here > update_h_load() has escaped rq->lock? > > load_balance() > ... > update_h_load(env.src_cpu); > more_balance: > local_irq_save(flags); > double_rq_lock(env.dst_rq, busiest); > > > >> ld_moved = 0; >> lb_iterations = 1; >> + >> + env.src_cpu = busiest->cpu; >> + env.src_rq= busiest; >> + > > Hmm -- But shouldn't find_busiest_queue return NULL in this case? > > We're admittedly racing but we should have: > busiest->nr_running == 1 > wl = rq->load.weight > env->imbalance since imbalance < (wl - > this_rq->load.weight / 2) check_asym_packing overwrites the env->imbalance with (sds->max_load * sds->busiest->sgp->power / SCHED_POWER_SCALE) so fbq can return a queue > > This would seem specialized to the case when we either > A) race (fbq is openly racy) > B) have no capacity > > Admittedly when we do race current case this is probably a biased > coinflip depending on whatever was on the stack (src_cpu is > uninitialized); it's also super easy for this to later become a crash > if someone tries to dereference src_rq so we should do something. > > The case this seems most important for (and something we should add an > explicit comment regarding) is that we want this case specifically for > CPU_NEW_IDLE to move a single runnable-task to a lower numbered > idle-cpu index in the SD_ASYM case (although I suspect we need to push > this up to fbq also to actually find it...) The update of imbalance should be enough > > In the !SD_ASYM case we'd probably also want to re-check busiest > nr_running in need_active_balance (or probably better alternative > re-arrange the checks) since this is going to potentially now move a > single large task needlessly to an already loaded rq in balance-failed > case. yes, that could be an interesting add-on > > >> if (busiest->nr_running > 1) { >> /* >> * Attempt to move tasks. If find_busiest_group has found >> @@ -5052,8 +5056,6 @@ redo: >> * correctly treated as an imbalance. >> */ >> env.flags |= LBF_ALL_PINNED; >> - env.src_cpu = busiest->cpu; >> - env.src_rq= busiest; >> env.loop_max = min(sysctl_sched_nr_migrate, >> busiest->nr_running); >> >> update_h_load(env.src_cpu); >> -- >> 1.7.9.5 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
On 13 February 2013 21:02, Damien Wyart wrote: >> Bingo, that was the problem in my setup: as the patch was applied >> through a script with others, I had missed the error message about the >> conflict (I have also another conflict which can be safely ignored so >> the new one did not catch my eye)... So the patch was only >> half-applied, and the final code is broken. > >> How did you solve the conflict (I am not a scheduler expert)? I can >> retry running the patched kernel with your resolution, to check if >> everything is ok. > > After looking a bit more, the conflict resolution seemed straighforward, Yes the resolution is straightforward and your patch is ok. > so I gave it a go. The attached version booted fine, so the initial > problem was purely PEBCAK... Sorry for the noise! Now, i know that my patch on frederic's branch, boots on a x86_64, Core i7 920 :-) Vincent > > -- > Damien -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
On 13 February 2013 21:02, Damien Wyart damien.wy...@gmail.com wrote: Bingo, that was the problem in my setup: as the patch was applied through a script with others, I had missed the error message about the conflict (I have also another conflict which can be safely ignored so the new one did not catch my eye)... So the patch was only half-applied, and the final code is broken. How did you solve the conflict (I am not a scheduler expert)? I can retry running the patched kernel with your resolution, to check if everything is ok. After looking a bit more, the conflict resolution seemed straighforward, Yes the resolution is straightforward and your patch is ok. so I gave it a go. The attached version booted fine, so the initial problem was purely PEBCAK... Sorry for the noise! Now, i know that my patch on frederic's branch, boots on a x86_64, Core i7 920 :-) Vincent -- Damien -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
On 13 February 2013 21:03, Paul Turner p...@google.com wrote: On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot vincent.guit...@linaro.org wrote: need_active_balance uses env-src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test env-src_cpu env-dst_cpu will always fail if there is only 1 task on the run queue Signed-off-by: Vincent Guittot vincent.guit...@linaro.org --- kernel/sched/fair.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: Aside: Oh dear, it looks like in all the re-factoring here update_h_load() has escaped rq-lock? load_balance() ... update_h_load(env.src_cpu); more_balance: local_irq_save(flags); double_rq_lock(env.dst_rq, busiest); ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest-cpu; + env.src_rq= busiest; + Hmm -- But shouldn't find_busiest_queue return NULL in this case? We're admittedly racing but we should have: busiest-nr_running == 1 wl = rq-load.weight env-imbalance since imbalance (wl - this_rq-load.weight / 2) check_asym_packing overwrites the env-imbalance with (sds-max_load * sds-busiest-sgp-power / SCHED_POWER_SCALE) so fbq can return a queue This would seem specialized to the case when we either A) race (fbq is openly racy) B) have no capacity Admittedly when we do race current case this is probably a biased coinflip depending on whatever was on the stack (src_cpu is uninitialized); it's also super easy for this to later become a crash if someone tries to dereference src_rq so we should do something. The case this seems most important for (and something we should add an explicit comment regarding) is that we want this case specifically for CPU_NEW_IDLE to move a single runnable-task to a lower numbered idle-cpu index in the SD_ASYM case (although I suspect we need to push this up to fbq also to actually find it...) The update of imbalance should be enough In the !SD_ASYM case we'd probably also want to re-check busiest nr_running in need_active_balance (or probably better alternative re-arrange the checks) since this is going to potentially now move a single large task needlessly to an already loaded rq in balance-failed case. yes, that could be an interesting add-on if (busiest-nr_running 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest-cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest-nr_running); update_h_load(env.src_cpu); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot wrote: > need_active_balance uses env->src_cpu which is set only if there is more > than 1 task on the run queue. > We must set the src_cpu field unconditionnally > otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is > only 1 task on the run queue > > Signed-off-by: Vincent Guittot > --- > kernel/sched/fair.c |6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 81fa536..32938ea 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5044,6 +5044,10 @@ redo: > Aside: Oh dear, it looks like in all the re-factoring here update_h_load() has escaped rq->lock? load_balance() ... update_h_load(env.src_cpu); more_balance: local_irq_save(flags); double_rq_lock(env.dst_rq, busiest); > ld_moved = 0; > lb_iterations = 1; > + > + env.src_cpu = busiest->cpu; > + env.src_rq= busiest; > + Hmm -- But shouldn't find_busiest_queue return NULL in this case? We're admittedly racing but we should have: busiest->nr_running == 1 wl = rq->load.weight > env->imbalance since imbalance < (wl - this_rq->load.weight / 2) This would seem specialized to the case when we either A) race (fbq is openly racy) B) have no capacity Admittedly when we do race current case this is probably a biased coinflip depending on whatever was on the stack (src_cpu is uninitialized); it's also super easy for this to later become a crash if someone tries to dereference src_rq so we should do something. The case this seems most important for (and something we should add an explicit comment regarding) is that we want this case specifically for CPU_NEW_IDLE to move a single runnable-task to a lower numbered idle-cpu index in the SD_ASYM case (although I suspect we need to push this up to fbq also to actually find it...) In the !SD_ASYM case we'd probably also want to re-check busiest nr_running in need_active_balance (or probably better alternative re-arrange the checks) since this is going to potentially now move a single large task needlessly to an already loaded rq in balance-failed case. > if (busiest->nr_running > 1) { > /* > * Attempt to move tasks. If find_busiest_group has found > @@ -5052,8 +5056,6 @@ redo: > * correctly treated as an imbalance. > */ > env.flags |= LBF_ALL_PINNED; > - env.src_cpu = busiest->cpu; > - env.src_rq= busiest; > env.loop_max = min(sysctl_sched_nr_migrate, > busiest->nr_running); > > update_h_load(env.src_cpu); > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
> Bingo, that was the problem in my setup: as the patch was applied > through a script with others, I had missed the error message about the > conflict (I have also another conflict which can be safely ignored so > the new one did not catch my eye)... So the patch was only > half-applied, and the final code is broken. > How did you solve the conflict (I am not a scheduler expert)? I can > retry running the patched kernel with your resolution, to check if > everything is ok. After looking a bit more, the conflict resolution seemed straighforward, so I gave it a go. The attached version booted fine, so the initial problem was purely PEBCAK... Sorry for the noise! -- Damien --- a/kernel/sched/fair.c 2013-02-06 20:49:17.447613049 +0100 +++ b/kernel/sched/fair.c 2013-02-13 20:52:37.409507417 +0100 @@ -5060,6 +5060,10 @@ ld_moved = 0; lb_iterations = 1; clock_updated = 0; + + env.src_cpu = busiest->cpu; + env.src_rq= busiest; + if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5068,8 +5072,6 @@ * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest->cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); update_h_load(env.src_cpu);
Re: [PATCH] sched: fix env->src_cpu for active migration
* Vincent Guittot [2013-02-13 18:49]: > I have look into Frederic's tree but i didn't find any reason that > could explain your problem. May be Frederic will have some ideas > I have also tested his branch with and without my patch and both > kernel are booting (on an ARM platform without using the full_nohz > feature ). I have faced a conflict when i have applied my patch on his > branch as we insert code at the same place. You should have faced the > same conflict. How have you solved it ? Bingo, that was the problem in my setup: as the patch was applied through a script with others, I had missed the error message about the conflict (I have also another conflict which can be safely ignored so the new one did not catch my eye)... So the patch was only half-applied, and the final code is broken. How did you solve the conflict (I am not a scheduler expert)? I can retry running the patched kernel with your resolution, to check if everything is ok. Thanks and sorry for this, -- Damien -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
On 13 February 2013 15:28, Vincent Guittot wrote: > On 13 February 2013 15:08, Damien Wyart wrote: >> * Vincent Guittot [2013-02-13 13:08]: >>> Damien, >>> Regarding your sched_domain config and especially the flags field, you >>> should not be impacted by my patch because >>> - need_active_balance is the only new place that use env->src_cpu in >>> the load_balance function >>> - and your machine will never test the condition: "env->src_cpu > >>> env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not >>> set in your config >> >>> Have you tested the patch with others that could have modified the >>> load_balance function ? >> >> Yes, sorry, I should have been more precise in my initial report: your >> patch was not applied on top of vanilla 3.8-rc7, but a few other patches >> were also present. Seems the ones impacting load_balance are from >> Frederic's nohz work >> (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4) >> > > ok > thanks for the pointer. i'm going to have a look > > >> The patches from there modifying load_balance are: >> - "sched: Update nohz rq clock before searching busiest group on load >> balancing" >> - "sched: Update clock of nohz busiest rq before balancing" >> >> In this test, I did not use any kernel parameter related to this >> patchset (full_nohz, etc.). >> >> I am adding Frederic in Cc, not sure if the breakage is to be >> investigated on your side or his... > > probably both I have look into Frederic's tree but i didn't find any reason that could explain your problem. May be Frederic will have some ideas I have also tested his branch with and without my patch and both kernel are booting (on an ARM platform without using the full_nohz feature ). I have faced a conflict when i have applied my patch on his branch as we insert code at the same place. You should have faced the same conflict. How have you solved it ? Vincent > >> >> >> Thanks for your explanations, >> -- >> Damien -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
On 13 February 2013 15:08, Damien Wyart wrote: > * Vincent Guittot [2013-02-13 13:08]: >> Damien, >> Regarding your sched_domain config and especially the flags field, you >> should not be impacted by my patch because >> - need_active_balance is the only new place that use env->src_cpu in >> the load_balance function >> - and your machine will never test the condition: "env->src_cpu > >> env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not >> set in your config > >> Have you tested the patch with others that could have modified the >> load_balance function ? > > Yes, sorry, I should have been more precise in my initial report: your > patch was not applied on top of vanilla 3.8-rc7, but a few other patches > were also present. Seems the ones impacting load_balance are from > Frederic's nohz work > (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4) > ok thanks for the pointer. i'm going to have a look > The patches from there modifying load_balance are: > - "sched: Update nohz rq clock before searching busiest group on load > balancing" > - "sched: Update clock of nohz busiest rq before balancing" > > In this test, I did not use any kernel parameter related to this > patchset (full_nohz, etc.). > > I am adding Frederic in Cc, not sure if the breakage is to be > investigated on your side or his... probably both > > > Thanks for your explanations, > -- > Damien -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
* Vincent Guittot [2013-02-13 13:08]: > Damien, > Regarding your sched_domain config and especially the flags field, you > should not be impacted by my patch because > - need_active_balance is the only new place that use env->src_cpu in > the load_balance function > - and your machine will never test the condition: "env->src_cpu > > env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not > set in your config > Have you tested the patch with others that could have modified the > load_balance function ? Yes, sorry, I should have been more precise in my initial report: your patch was not applied on top of vanilla 3.8-rc7, but a few other patches were also present. Seems the ones impacting load_balance are from Frederic's nohz work (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4) The patches from there modifying load_balance are: - "sched: Update nohz rq clock before searching busiest group on load balancing" - "sched: Update clock of nohz busiest rq before balancing" In this test, I did not use any kernel parameter related to this patchset (full_nohz, etc.). I am adding Frederic in Cc, not sure if the breakage is to be investigated on your side or his... Thanks for your explanations, -- Damien -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because - need_active_balance is the only new place that use env->src_cpu in the load_balance function - and your machine will never test the condition: "env->src_cpu > env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not set in your config Have you tested the patch with others that could have modified the load_balance function ? Vincent On 13 February 2013 10:21, Damien Wyart wrote: > Hi, > >> Thanks for the test and the feedback. >> Could you send me the sched_domain configuration of your machine with >> the kernel that boots on your machine ? >> It's available in /proc/sys/kernel/sched_domain/cpu*/ > > I attach to this mail the sched_domain info, the contents of > /proc/cpuinfo and my kernel config. > > -- > Damien -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because - need_active_balance is the only new place that use env-src_cpu in the load_balance function - and your machine will never test the condition: env-src_cpu env-dst_cpu in need_active_balance because SD_ASYM_PACKING is not set in your config Have you tested the patch with others that could have modified the load_balance function ? Vincent On 13 February 2013 10:21, Damien Wyart damien.wy...@gmail.com wrote: Hi, Thanks for the test and the feedback. Could you send me the sched_domain configuration of your machine with the kernel that boots on your machine ? It's available in /proc/sys/kernel/sched_domain/cpu*/ I attach to this mail the sched_domain info, the contents of /proc/cpuinfo and my kernel config. -- Damien -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
* Vincent Guittot vincent.guit...@linaro.org [2013-02-13 13:08]: Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because - need_active_balance is the only new place that use env-src_cpu in the load_balance function - and your machine will never test the condition: env-src_cpu env-dst_cpu in need_active_balance because SD_ASYM_PACKING is not set in your config Have you tested the patch with others that could have modified the load_balance function ? Yes, sorry, I should have been more precise in my initial report: your patch was not applied on top of vanilla 3.8-rc7, but a few other patches were also present. Seems the ones impacting load_balance are from Frederic's nohz work (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4) The patches from there modifying load_balance are: - sched: Update nohz rq clock before searching busiest group on load balancing - sched: Update clock of nohz busiest rq before balancing In this test, I did not use any kernel parameter related to this patchset (full_nohz, etc.). I am adding Frederic in Cc, not sure if the breakage is to be investigated on your side or his... Thanks for your explanations, -- Damien -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
On 13 February 2013 15:08, Damien Wyart damien.wy...@gmail.com wrote: * Vincent Guittot vincent.guit...@linaro.org [2013-02-13 13:08]: Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because - need_active_balance is the only new place that use env-src_cpu in the load_balance function - and your machine will never test the condition: env-src_cpu env-dst_cpu in need_active_balance because SD_ASYM_PACKING is not set in your config Have you tested the patch with others that could have modified the load_balance function ? Yes, sorry, I should have been more precise in my initial report: your patch was not applied on top of vanilla 3.8-rc7, but a few other patches were also present. Seems the ones impacting load_balance are from Frederic's nohz work (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4) ok thanks for the pointer. i'm going to have a look The patches from there modifying load_balance are: - sched: Update nohz rq clock before searching busiest group on load balancing - sched: Update clock of nohz busiest rq before balancing In this test, I did not use any kernel parameter related to this patchset (full_nohz, etc.). I am adding Frederic in Cc, not sure if the breakage is to be investigated on your side or his... probably both Thanks for your explanations, -- Damien -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
On 13 February 2013 15:28, Vincent Guittot vincent.guit...@linaro.org wrote: On 13 February 2013 15:08, Damien Wyart damien.wy...@gmail.com wrote: * Vincent Guittot vincent.guit...@linaro.org [2013-02-13 13:08]: Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because - need_active_balance is the only new place that use env-src_cpu in the load_balance function - and your machine will never test the condition: env-src_cpu env-dst_cpu in need_active_balance because SD_ASYM_PACKING is not set in your config Have you tested the patch with others that could have modified the load_balance function ? Yes, sorry, I should have been more precise in my initial report: your patch was not applied on top of vanilla 3.8-rc7, but a few other patches were also present. Seems the ones impacting load_balance are from Frederic's nohz work (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4) ok thanks for the pointer. i'm going to have a look The patches from there modifying load_balance are: - sched: Update nohz rq clock before searching busiest group on load balancing - sched: Update clock of nohz busiest rq before balancing In this test, I did not use any kernel parameter related to this patchset (full_nohz, etc.). I am adding Frederic in Cc, not sure if the breakage is to be investigated on your side or his... probably both I have look into Frederic's tree but i didn't find any reason that could explain your problem. May be Frederic will have some ideas I have also tested his branch with and without my patch and both kernel are booting (on an ARM platform without using the full_nohz feature ). I have faced a conflict when i have applied my patch on his branch as we insert code at the same place. You should have faced the same conflict. How have you solved it ? Vincent Thanks for your explanations, -- Damien -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
* Vincent Guittot vincent.guit...@linaro.org [2013-02-13 18:49]: I have look into Frederic's tree but i didn't find any reason that could explain your problem. May be Frederic will have some ideas I have also tested his branch with and without my patch and both kernel are booting (on an ARM platform without using the full_nohz feature ). I have faced a conflict when i have applied my patch on his branch as we insert code at the same place. You should have faced the same conflict. How have you solved it ? Bingo, that was the problem in my setup: as the patch was applied through a script with others, I had missed the error message about the conflict (I have also another conflict which can be safely ignored so the new one did not catch my eye)... So the patch was only half-applied, and the final code is broken. How did you solve the conflict (I am not a scheduler expert)? I can retry running the patched kernel with your resolution, to check if everything is ok. Thanks and sorry for this, -- Damien -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
Bingo, that was the problem in my setup: as the patch was applied through a script with others, I had missed the error message about the conflict (I have also another conflict which can be safely ignored so the new one did not catch my eye)... So the patch was only half-applied, and the final code is broken. How did you solve the conflict (I am not a scheduler expert)? I can retry running the patched kernel with your resolution, to check if everything is ok. After looking a bit more, the conflict resolution seemed straighforward, so I gave it a go. The attached version booted fine, so the initial problem was purely PEBCAK... Sorry for the noise! -- Damien --- a/kernel/sched/fair.c 2013-02-06 20:49:17.447613049 +0100 +++ b/kernel/sched/fair.c 2013-02-13 20:52:37.409507417 +0100 @@ -5060,6 +5060,10 @@ ld_moved = 0; lb_iterations = 1; clock_updated = 0; + + env.src_cpu = busiest-cpu; + env.src_rq= busiest; + if (busiest-nr_running 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5068,8 +5072,6 @@ * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest-cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest-nr_running); update_h_load(env.src_cpu);
Re: [PATCH] sched: fix env-src_cpu for active migration
On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot vincent.guit...@linaro.org wrote: need_active_balance uses env-src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test env-src_cpu env-dst_cpu will always fail if there is only 1 task on the run queue Signed-off-by: Vincent Guittot vincent.guit...@linaro.org --- kernel/sched/fair.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: Aside: Oh dear, it looks like in all the re-factoring here update_h_load() has escaped rq-lock? load_balance() ... update_h_load(env.src_cpu); more_balance: local_irq_save(flags); double_rq_lock(env.dst_rq, busiest); ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest-cpu; + env.src_rq= busiest; + Hmm -- But shouldn't find_busiest_queue return NULL in this case? We're admittedly racing but we should have: busiest-nr_running == 1 wl = rq-load.weight env-imbalance since imbalance (wl - this_rq-load.weight / 2) This would seem specialized to the case when we either A) race (fbq is openly racy) B) have no capacity Admittedly when we do race current case this is probably a biased coinflip depending on whatever was on the stack (src_cpu is uninitialized); it's also super easy for this to later become a crash if someone tries to dereference src_rq so we should do something. The case this seems most important for (and something we should add an explicit comment regarding) is that we want this case specifically for CPU_NEW_IDLE to move a single runnable-task to a lower numbered idle-cpu index in the SD_ASYM case (although I suspect we need to push this up to fbq also to actually find it...) In the !SD_ASYM case we'd probably also want to re-check busiest nr_running in need_active_balance (or probably better alternative re-arrange the checks) since this is going to potentially now move a single large task needlessly to an already loaded rq in balance-failed case. if (busiest-nr_running 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest-cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest-nr_running); update_h_load(env.src_cpu); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
Hi Damien, Thanks for the test and the feedback. Could you send me the sched_domain configuration of your machine with the kernel that boots on your machine ? It's available in /proc/sys/kernel/sched_domain/cpu*/ This should not have any impact on your machine but it looks like it have one. Regards, Vincent On 13 February 2013 07:18, Damien Wyart wrote: > Hi, > > I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core > i7 920) unable to boot (very early as nothing at all is displayed on > screen). Nothing in the kernel log (after booting with a working > kernel). > > Double-checked by just backing out only this patch and this made the > machine boot again. > > Damien > >> need_active_balance uses env->src_cpu which is set only if there is more >> than 1 task on the run queue. We must set the src_cpu field unconditionnally >> otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is >> only 1 task on the run queue >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/fair.c |6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 81fa536..32938ea 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -5044,6 +5044,10 @@ redo: >> >> ld_moved = 0; >> lb_iterations = 1; >> + >> + env.src_cpu = busiest->cpu; >> + env.src_rq= busiest; >> + >> if (busiest->nr_running > 1) { >> /* >>* Attempt to move tasks. If find_busiest_group has found >> @@ -5052,8 +5056,6 @@ redo: >>* correctly treated as an imbalance. >>*/ >> env.flags |= LBF_ALL_PINNED; >> - env.src_cpu = busiest->cpu; >> - env.src_rq= busiest; >> env.loop_max = min(sysctl_sched_nr_migrate, >> busiest->nr_running); >> >> update_h_load(env.src_cpu); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env->src_cpu for active migration
Hi, I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core i7 920) unable to boot (very early as nothing at all is displayed on screen). Nothing in the kernel log (after booting with a working kernel). Double-checked by just backing out only this patch and this made the machine boot again. Damien > need_active_balance uses env->src_cpu which is set only if there is more > than 1 task on the run queue. We must set the src_cpu field unconditionnally > otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is > only 1 task on the run queue > > Signed-off-by: Vincent Guittot > --- > kernel/sched/fair.c |6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 81fa536..32938ea 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5044,6 +5044,10 @@ redo: > > ld_moved = 0; > lb_iterations = 1; > + > + env.src_cpu = busiest->cpu; > + env.src_rq= busiest; > + > if (busiest->nr_running > 1) { > /* >* Attempt to move tasks. If find_busiest_group has found > @@ -5052,8 +5056,6 @@ redo: >* correctly treated as an imbalance. >*/ > env.flags |= LBF_ALL_PINNED; > - env.src_cpu = busiest->cpu; > - env.src_rq= busiest; > env.loop_max = min(sysctl_sched_nr_migrate, > busiest->nr_running); > > update_h_load(env.src_cpu); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched: fix env->src_cpu for active migration
need_active_balance uses env->src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is only 1 task on the run queue Signed-off-by: Vincent Guittot --- kernel/sched/fair.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest->cpu; + env.src_rq= busiest; + if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest->cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); update_h_load(env.src_cpu); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched: fix env-src_cpu for active migration
need_active_balance uses env-src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test env-src_cpu env-dst_cpu will always fail if there is only 1 task on the run queue Signed-off-by: Vincent Guittot vincent.guit...@linaro.org --- kernel/sched/fair.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest-cpu; + env.src_rq= busiest; + if (busiest-nr_running 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest-cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest-nr_running); update_h_load(env.src_cpu); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
Hi, I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core i7 920) unable to boot (very early as nothing at all is displayed on screen). Nothing in the kernel log (after booting with a working kernel). Double-checked by just backing out only this patch and this made the machine boot again. Damien need_active_balance uses env-src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test env-src_cpu env-dst_cpu will always fail if there is only 1 task on the run queue Signed-off-by: Vincent Guittot vincent.guit...@linaro.org --- kernel/sched/fair.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest-cpu; + env.src_rq= busiest; + if (busiest-nr_running 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest-cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest-nr_running); update_h_load(env.src_cpu); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix env-src_cpu for active migration
Hi Damien, Thanks for the test and the feedback. Could you send me the sched_domain configuration of your machine with the kernel that boots on your machine ? It's available in /proc/sys/kernel/sched_domain/cpu*/ This should not have any impact on your machine but it looks like it have one. Regards, Vincent On 13 February 2013 07:18, Damien Wyart damien.wy...@gmail.com wrote: Hi, I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core i7 920) unable to boot (very early as nothing at all is displayed on screen). Nothing in the kernel log (after booting with a working kernel). Double-checked by just backing out only this patch and this made the machine boot again. Damien need_active_balance uses env-src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test env-src_cpu env-dst_cpu will always fail if there is only 1 task on the run queue Signed-off-by: Vincent Guittot vincent.guit...@linaro.org --- kernel/sched/fair.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest-cpu; + env.src_rq= busiest; + if (busiest-nr_running 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest-cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest-nr_running); update_h_load(env.src_cpu); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/