Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-14 Thread Vincent Guittot
On 13 February 2013 21:03, Paul Turner  wrote:
> On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot
>  wrote:
>> need_active_balance uses env->src_cpu which is set only if there is more
>> than 1 task on the run queue.
>> We must set the src_cpu field unconditionnally
>> otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is
>> only 1 task on the run queue
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/sched/fair.c |6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 81fa536..32938ea 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5044,6 +5044,10 @@ redo:
>>
>
> Aside:  Oh dear, it looks like in all the re-factoring here
> update_h_load() has escaped rq->lock?
>
> load_balance()
> ...
> update_h_load(env.src_cpu);
> more_balance:
> local_irq_save(flags);
> double_rq_lock(env.dst_rq, busiest);
>
>
>
>> ld_moved = 0;
>> lb_iterations = 1;
>> +
>> +   env.src_cpu   = busiest->cpu;
>> +   env.src_rq= busiest;
>> +
>
> Hmm -- But shouldn't find_busiest_queue return NULL in this case?
>
> We're admittedly racing but we should have:
>   busiest->nr_running == 1
>   wl = rq->load.weight > env->imbalance since imbalance < (wl -
> this_rq->load.weight / 2)

check_asym_packing overwrites the env->imbalance with (sds->max_load *
sds->busiest->sgp->power / SCHED_POWER_SCALE) so fbq can return a
queue

>
> This would seem specialized to the case when we either
>   A) race (fbq is openly racy)
>   B) have no capacity
>
> Admittedly when we do race current case this is probably a biased
> coinflip depending on whatever was on the stack (src_cpu is
> uninitialized); it's also super easy for this to later become a crash
> if someone tries to dereference src_rq so we should do something.
>
> The case this seems most important for (and something we should add an
> explicit comment regarding) is that we want this case specifically for
> CPU_NEW_IDLE to move a single runnable-task to a lower numbered
> idle-cpu index in the SD_ASYM case (although I suspect we need to push
> this up to fbq also to actually find it...)

The update of imbalance should be enough

>
> In the !SD_ASYM case we'd probably also want to  re-check busiest
> nr_running in need_active_balance (or probably better alternative
> re-arrange the checks) since this is going to potentially now move a
> single large task needlessly to an already loaded rq in balance-failed
> case.

yes, that could be an interesting add-on

>
>
>> if (busiest->nr_running > 1) {
>> /*
>>  * Attempt to move tasks. If find_busiest_group has found
>> @@ -5052,8 +5056,6 @@ redo:
>>  * correctly treated as an imbalance.
>>  */
>> env.flags |= LBF_ALL_PINNED;
>> -   env.src_cpu   = busiest->cpu;
>> -   env.src_rq= busiest;
>> env.loop_max  = min(sysctl_sched_nr_migrate, 
>> busiest->nr_running);
>>
>> update_h_load(env.src_cpu);
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-14 Thread Vincent Guittot
On 13 February 2013 21:02, Damien Wyart  wrote:
>> Bingo, that was the problem in my setup: as the patch was applied
>> through a script with others, I had missed the error message about the
>> conflict (I have also another conflict which can be safely ignored so
>> the new one did not catch my eye)... So the patch was only
>> half-applied, and the final code is broken.
>
>> How did you solve the conflict (I am not a scheduler expert)? I can
>> retry running the patched kernel with your resolution, to check if
>> everything is ok.
>
> After looking a bit more, the conflict resolution seemed straighforward,

Yes the resolution is straightforward and your patch is ok.

> so I gave it a go. The attached version booted fine, so the initial
> problem was purely PEBCAK... Sorry for the noise!

Now, i know that my patch on frederic's branch, boots on a x86_64,
Core i7 920 :-)

Vincent

>
> --
> Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-14 Thread Vincent Guittot
On 13 February 2013 21:02, Damien Wyart damien.wy...@gmail.com wrote:
 Bingo, that was the problem in my setup: as the patch was applied
 through a script with others, I had missed the error message about the
 conflict (I have also another conflict which can be safely ignored so
 the new one did not catch my eye)... So the patch was only
 half-applied, and the final code is broken.

 How did you solve the conflict (I am not a scheduler expert)? I can
 retry running the patched kernel with your resolution, to check if
 everything is ok.

 After looking a bit more, the conflict resolution seemed straighforward,

Yes the resolution is straightforward and your patch is ok.

 so I gave it a go. The attached version booted fine, so the initial
 problem was purely PEBCAK... Sorry for the noise!

Now, i know that my patch on frederic's branch, boots on a x86_64,
Core i7 920 :-)

Vincent


 --
 Damien
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-14 Thread Vincent Guittot
On 13 February 2013 21:03, Paul Turner p...@google.com wrote:
 On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot
 vincent.guit...@linaro.org wrote:
 need_active_balance uses env-src_cpu which is set only if there is more
 than 1 task on the run queue.
 We must set the src_cpu field unconditionnally
 otherwise the test env-src_cpu  env-dst_cpu will always fail if there is
 only 1 task on the run queue

 Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
 ---
  kernel/sched/fair.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 81fa536..32938ea 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -5044,6 +5044,10 @@ redo:


 Aside:  Oh dear, it looks like in all the re-factoring here
 update_h_load() has escaped rq-lock?

 load_balance()
 ...
 update_h_load(env.src_cpu);
 more_balance:
 local_irq_save(flags);
 double_rq_lock(env.dst_rq, busiest);



 ld_moved = 0;
 lb_iterations = 1;
 +
 +   env.src_cpu   = busiest-cpu;
 +   env.src_rq= busiest;
 +

 Hmm -- But shouldn't find_busiest_queue return NULL in this case?

 We're admittedly racing but we should have:
   busiest-nr_running == 1
   wl = rq-load.weight  env-imbalance since imbalance  (wl -
 this_rq-load.weight / 2)

check_asym_packing overwrites the env-imbalance with (sds-max_load *
sds-busiest-sgp-power / SCHED_POWER_SCALE) so fbq can return a
queue


 This would seem specialized to the case when we either
   A) race (fbq is openly racy)
   B) have no capacity

 Admittedly when we do race current case this is probably a biased
 coinflip depending on whatever was on the stack (src_cpu is
 uninitialized); it's also super easy for this to later become a crash
 if someone tries to dereference src_rq so we should do something.

 The case this seems most important for (and something we should add an
 explicit comment regarding) is that we want this case specifically for
 CPU_NEW_IDLE to move a single runnable-task to a lower numbered
 idle-cpu index in the SD_ASYM case (although I suspect we need to push
 this up to fbq also to actually find it...)

The update of imbalance should be enough


 In the !SD_ASYM case we'd probably also want to  re-check busiest
 nr_running in need_active_balance (or probably better alternative
 re-arrange the checks) since this is going to potentially now move a
 single large task needlessly to an already loaded rq in balance-failed
 case.

yes, that could be an interesting add-on



 if (busiest-nr_running  1) {
 /*
  * Attempt to move tasks. If find_busiest_group has found
 @@ -5052,8 +5056,6 @@ redo:
  * correctly treated as an imbalance.
  */
 env.flags |= LBF_ALL_PINNED;
 -   env.src_cpu   = busiest-cpu;
 -   env.src_rq= busiest;
 env.loop_max  = min(sysctl_sched_nr_migrate, 
 busiest-nr_running);

 update_h_load(env.src_cpu);
 --
 1.7.9.5

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-13 Thread Paul Turner
On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot
 wrote:
> need_active_balance uses env->src_cpu which is set only if there is more
> than 1 task on the run queue.
> We must set the src_cpu field unconditionnally
> otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is
> only 1 task on the run queue
>
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/fair.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 81fa536..32938ea 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5044,6 +5044,10 @@ redo:
>

Aside:  Oh dear, it looks like in all the re-factoring here
update_h_load() has escaped rq->lock?

load_balance()
...
update_h_load(env.src_cpu);
more_balance:
local_irq_save(flags);
double_rq_lock(env.dst_rq, busiest);



> ld_moved = 0;
> lb_iterations = 1;
> +
> +   env.src_cpu   = busiest->cpu;
> +   env.src_rq= busiest;
> +

Hmm -- But shouldn't find_busiest_queue return NULL in this case?

We're admittedly racing but we should have:
  busiest->nr_running == 1
  wl = rq->load.weight > env->imbalance since imbalance < (wl -
this_rq->load.weight / 2)

This would seem specialized to the case when we either
  A) race (fbq is openly racy)
  B) have no capacity

Admittedly when we do race current case this is probably a biased
coinflip depending on whatever was on the stack (src_cpu is
uninitialized); it's also super easy for this to later become a crash
if someone tries to dereference src_rq so we should do something.

The case this seems most important for (and something we should add an
explicit comment regarding) is that we want this case specifically for
CPU_NEW_IDLE to move a single runnable-task to a lower numbered
idle-cpu index in the SD_ASYM case (although I suspect we need to push
this up to fbq also to actually find it...)

In the !SD_ASYM case we'd probably also want to  re-check busiest
nr_running in need_active_balance (or probably better alternative
re-arrange the checks) since this is going to potentially now move a
single large task needlessly to an already loaded rq in balance-failed
case.


> if (busiest->nr_running > 1) {
> /*
>  * Attempt to move tasks. If find_busiest_group has found
> @@ -5052,8 +5056,6 @@ redo:
>  * correctly treated as an imbalance.
>  */
> env.flags |= LBF_ALL_PINNED;
> -   env.src_cpu   = busiest->cpu;
> -   env.src_rq= busiest;
> env.loop_max  = min(sysctl_sched_nr_migrate, 
> busiest->nr_running);
>
> update_h_load(env.src_cpu);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-13 Thread Damien Wyart
> Bingo, that was the problem in my setup: as the patch was applied
> through a script with others, I had missed the error message about the
> conflict (I have also another conflict which can be safely ignored so
> the new one did not catch my eye)... So the patch was only
> half-applied, and the final code is broken.

> How did you solve the conflict (I am not a scheduler expert)? I can
> retry running the patched kernel with your resolution, to check if
> everything is ok.

After looking a bit more, the conflict resolution seemed straighforward,
so I gave it a go. The attached version booted fine, so the initial
problem was purely PEBCAK... Sorry for the noise!

-- 
Damien
--- a/kernel/sched/fair.c	2013-02-06 20:49:17.447613049 +0100
+++ b/kernel/sched/fair.c	2013-02-13 20:52:37.409507417 +0100
@@ -5060,6 +5060,10 @@
 	ld_moved = 0;
 	lb_iterations = 1;
 	clock_updated = 0;
+
+	env.src_cpu   = busiest->cpu;
+	env.src_rq= busiest;
+	
 	if (busiest->nr_running > 1) {
 		/*
 		 * Attempt to move tasks. If find_busiest_group has found
@@ -5068,8 +5072,6 @@
 		 * correctly treated as an imbalance.
 		 */
 		env.flags |= LBF_ALL_PINNED;
-		env.src_cpu   = busiest->cpu;
-		env.src_rq= busiest;
 		env.loop_max  = min(sysctl_sched_nr_migrate, busiest->nr_running);
 
 		update_h_load(env.src_cpu);


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-13 Thread Damien Wyart
* Vincent Guittot  [2013-02-13 18:49]:
> I have look into Frederic's tree but i didn't find any reason that
> could explain your problem. May be Frederic will have some ideas
> I have also tested his branch with and without my patch and both
> kernel are booting (on an ARM platform without using the full_nohz
> feature ). I have faced a conflict when i have applied my patch on his
> branch as we insert code at the same place. You should have faced the
> same conflict. How have you solved it ?

Bingo, that was the problem in my setup: as the patch was applied
through a script with others, I had missed the error message about the
conflict (I have also another conflict which can be safely ignored so
the new one did not catch my eye)... So the patch was only half-applied,
and the final code is broken.

How did you solve the conflict (I am not a scheduler expert)? I can
retry running the patched kernel with your resolution, to check if
everything is ok.


Thanks and sorry for this,
-- 
Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-13 Thread Vincent Guittot
On 13 February 2013 15:28, Vincent Guittot  wrote:
> On 13 February 2013 15:08, Damien Wyart  wrote:
>> * Vincent Guittot  [2013-02-13 13:08]:
>>> Damien,
>>> Regarding your sched_domain config and especially the flags field, you
>>> should not be impacted by my patch because
>>> - need_active_balance is the only new place that use env->src_cpu in
>>> the load_balance function
>>> - and your machine will never test the condition: "env->src_cpu >
>>> env->dst_cpu" in need_active_balance because  SD_ASYM_PACKING is not
>>> set in your config
>>
>>> Have you tested the patch with others that could have modified the
>>> load_balance function ?
>>
>> Yes, sorry, I should have been more precise in my initial report: your
>> patch was not applied on top of vanilla 3.8-rc7, but a few other patches
>> were also present. Seems the ones impacting load_balance are from
>> Frederic's nohz work
>> (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4)
>>
>
> ok
> thanks for the pointer. i'm going to have a look
>
>
>> The patches from there modifying load_balance are:
>> - "sched: Update nohz rq clock before searching busiest group on load 
>> balancing"
>> - "sched: Update clock of nohz busiest rq before balancing"
>>
>> In this test, I did not use any kernel parameter related to this
>> patchset (full_nohz, etc.).
>>
>> I am adding Frederic in Cc, not sure if the breakage is to be
>> investigated on your side or his...
>
> probably both

I have look into Frederic's tree but i didn't find any reason that
could explain your problem. May be Frederic will have some ideas
I have also tested his branch with and without my patch and both
kernel are booting (on an ARM platform without using the full_nohz
feature ).
I have faced a conflict when i have applied my patch on his branch as
we insert code at the same place. You should have faced the same
conflict. How have you solved it ?

Vincent
>
>>
>>
>> Thanks for your explanations,
>> --
>> Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-13 Thread Vincent Guittot
On 13 February 2013 15:08, Damien Wyart  wrote:
> * Vincent Guittot  [2013-02-13 13:08]:
>> Damien,
>> Regarding your sched_domain config and especially the flags field, you
>> should not be impacted by my patch because
>> - need_active_balance is the only new place that use env->src_cpu in
>> the load_balance function
>> - and your machine will never test the condition: "env->src_cpu >
>> env->dst_cpu" in need_active_balance because  SD_ASYM_PACKING is not
>> set in your config
>
>> Have you tested the patch with others that could have modified the
>> load_balance function ?
>
> Yes, sorry, I should have been more precise in my initial report: your
> patch was not applied on top of vanilla 3.8-rc7, but a few other patches
> were also present. Seems the ones impacting load_balance are from
> Frederic's nohz work
> (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4)
>

ok
thanks for the pointer. i'm going to have a look


> The patches from there modifying load_balance are:
> - "sched: Update nohz rq clock before searching busiest group on load 
> balancing"
> - "sched: Update clock of nohz busiest rq before balancing"
>
> In this test, I did not use any kernel parameter related to this
> patchset (full_nohz, etc.).
>
> I am adding Frederic in Cc, not sure if the breakage is to be
> investigated on your side or his...

probably both

>
>
> Thanks for your explanations,
> --
> Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-13 Thread Damien Wyart
* Vincent Guittot  [2013-02-13 13:08]:
> Damien,
> Regarding your sched_domain config and especially the flags field, you
> should not be impacted by my patch because
> - need_active_balance is the only new place that use env->src_cpu in
> the load_balance function
> - and your machine will never test the condition: "env->src_cpu >
> env->dst_cpu" in need_active_balance because  SD_ASYM_PACKING is not
> set in your config

> Have you tested the patch with others that could have modified the
> load_balance function ?

Yes, sorry, I should have been more precise in my initial report: your
patch was not applied on top of vanilla 3.8-rc7, but a few other patches
were also present. Seems the ones impacting load_balance are from
Frederic's nohz work
(http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4)

The patches from there modifying load_balance are:
- "sched: Update nohz rq clock before searching busiest group on load balancing"
- "sched: Update clock of nohz busiest rq before balancing"

In this test, I did not use any kernel parameter related to this
patchset (full_nohz, etc.).

I am adding Frederic in Cc, not sure if the breakage is to be
investigated on your side or his...


Thanks for your explanations,
-- 
Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-13 Thread Vincent Guittot
Damien,

Regarding your sched_domain config and especially the flags field, you
should not be impacted by my patch because
- need_active_balance is the only new place that use env->src_cpu in
the load_balance function
- and your machine will never test the condition: "env->src_cpu >
env->dst_cpu" in need_active_balance because  SD_ASYM_PACKING is not
set in your config

Have you tested the patch with others that could have modified the
load_balance function ?

Vincent

On 13 February 2013 10:21, Damien Wyart  wrote:
> Hi,
>
>> Thanks for the test and the feedback.
>> Could you send me the sched_domain configuration of your machine with
>> the kernel that boots on your machine ?
>> It's available in /proc/sys/kernel/sched_domain/cpu*/
>
> I attach to this mail the sched_domain info, the contents of
> /proc/cpuinfo and my kernel config.
>
> --
> Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-13 Thread Vincent Guittot
Damien,

Regarding your sched_domain config and especially the flags field, you
should not be impacted by my patch because
- need_active_balance is the only new place that use env-src_cpu in
the load_balance function
- and your machine will never test the condition: env-src_cpu 
env-dst_cpu in need_active_balance because  SD_ASYM_PACKING is not
set in your config

Have you tested the patch with others that could have modified the
load_balance function ?

Vincent

On 13 February 2013 10:21, Damien Wyart damien.wy...@gmail.com wrote:
 Hi,

 Thanks for the test and the feedback.
 Could you send me the sched_domain configuration of your machine with
 the kernel that boots on your machine ?
 It's available in /proc/sys/kernel/sched_domain/cpu*/

 I attach to this mail the sched_domain info, the contents of
 /proc/cpuinfo and my kernel config.

 --
 Damien
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-13 Thread Damien Wyart
* Vincent Guittot vincent.guit...@linaro.org [2013-02-13 13:08]:
 Damien,
 Regarding your sched_domain config and especially the flags field, you
 should not be impacted by my patch because
 - need_active_balance is the only new place that use env-src_cpu in
 the load_balance function
 - and your machine will never test the condition: env-src_cpu 
 env-dst_cpu in need_active_balance because  SD_ASYM_PACKING is not
 set in your config

 Have you tested the patch with others that could have modified the
 load_balance function ?

Yes, sorry, I should have been more precise in my initial report: your
patch was not applied on top of vanilla 3.8-rc7, but a few other patches
were also present. Seems the ones impacting load_balance are from
Frederic's nohz work
(http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4)

The patches from there modifying load_balance are:
- sched: Update nohz rq clock before searching busiest group on load balancing
- sched: Update clock of nohz busiest rq before balancing

In this test, I did not use any kernel parameter related to this
patchset (full_nohz, etc.).

I am adding Frederic in Cc, not sure if the breakage is to be
investigated on your side or his...


Thanks for your explanations,
-- 
Damien
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-13 Thread Vincent Guittot
On 13 February 2013 15:08, Damien Wyart damien.wy...@gmail.com wrote:
 * Vincent Guittot vincent.guit...@linaro.org [2013-02-13 13:08]:
 Damien,
 Regarding your sched_domain config and especially the flags field, you
 should not be impacted by my patch because
 - need_active_balance is the only new place that use env-src_cpu in
 the load_balance function
 - and your machine will never test the condition: env-src_cpu 
 env-dst_cpu in need_active_balance because  SD_ASYM_PACKING is not
 set in your config

 Have you tested the patch with others that could have modified the
 load_balance function ?

 Yes, sorry, I should have been more precise in my initial report: your
 patch was not applied on top of vanilla 3.8-rc7, but a few other patches
 were also present. Seems the ones impacting load_balance are from
 Frederic's nohz work
 (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4)


ok
thanks for the pointer. i'm going to have a look


 The patches from there modifying load_balance are:
 - sched: Update nohz rq clock before searching busiest group on load 
 balancing
 - sched: Update clock of nohz busiest rq before balancing

 In this test, I did not use any kernel parameter related to this
 patchset (full_nohz, etc.).

 I am adding Frederic in Cc, not sure if the breakage is to be
 investigated on your side or his...

probably both



 Thanks for your explanations,
 --
 Damien
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-13 Thread Vincent Guittot
On 13 February 2013 15:28, Vincent Guittot vincent.guit...@linaro.org wrote:
 On 13 February 2013 15:08, Damien Wyart damien.wy...@gmail.com wrote:
 * Vincent Guittot vincent.guit...@linaro.org [2013-02-13 13:08]:
 Damien,
 Regarding your sched_domain config and especially the flags field, you
 should not be impacted by my patch because
 - need_active_balance is the only new place that use env-src_cpu in
 the load_balance function
 - and your machine will never test the condition: env-src_cpu 
 env-dst_cpu in need_active_balance because  SD_ASYM_PACKING is not
 set in your config

 Have you tested the patch with others that could have modified the
 load_balance function ?

 Yes, sorry, I should have been more precise in my initial report: your
 patch was not applied on top of vanilla 3.8-rc7, but a few other patches
 were also present. Seems the ones impacting load_balance are from
 Frederic's nohz work
 (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git;a=shortlog;h=refs/heads/3.8-rc6-nohz4)


 ok
 thanks for the pointer. i'm going to have a look


 The patches from there modifying load_balance are:
 - sched: Update nohz rq clock before searching busiest group on load 
 balancing
 - sched: Update clock of nohz busiest rq before balancing

 In this test, I did not use any kernel parameter related to this
 patchset (full_nohz, etc.).

 I am adding Frederic in Cc, not sure if the breakage is to be
 investigated on your side or his...

 probably both

I have look into Frederic's tree but i didn't find any reason that
could explain your problem. May be Frederic will have some ideas
I have also tested his branch with and without my patch and both
kernel are booting (on an ARM platform without using the full_nohz
feature ).
I have faced a conflict when i have applied my patch on his branch as
we insert code at the same place. You should have faced the same
conflict. How have you solved it ?

Vincent



 Thanks for your explanations,
 --
 Damien
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-13 Thread Damien Wyart
* Vincent Guittot vincent.guit...@linaro.org [2013-02-13 18:49]:
 I have look into Frederic's tree but i didn't find any reason that
 could explain your problem. May be Frederic will have some ideas
 I have also tested his branch with and without my patch and both
 kernel are booting (on an ARM platform without using the full_nohz
 feature ). I have faced a conflict when i have applied my patch on his
 branch as we insert code at the same place. You should have faced the
 same conflict. How have you solved it ?

Bingo, that was the problem in my setup: as the patch was applied
through a script with others, I had missed the error message about the
conflict (I have also another conflict which can be safely ignored so
the new one did not catch my eye)... So the patch was only half-applied,
and the final code is broken.

How did you solve the conflict (I am not a scheduler expert)? I can
retry running the patched kernel with your resolution, to check if
everything is ok.


Thanks and sorry for this,
-- 
Damien
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-13 Thread Damien Wyart
 Bingo, that was the problem in my setup: as the patch was applied
 through a script with others, I had missed the error message about the
 conflict (I have also another conflict which can be safely ignored so
 the new one did not catch my eye)... So the patch was only
 half-applied, and the final code is broken.

 How did you solve the conflict (I am not a scheduler expert)? I can
 retry running the patched kernel with your resolution, to check if
 everything is ok.

After looking a bit more, the conflict resolution seemed straighforward,
so I gave it a go. The attached version booted fine, so the initial
problem was purely PEBCAK... Sorry for the noise!

-- 
Damien
--- a/kernel/sched/fair.c	2013-02-06 20:49:17.447613049 +0100
+++ b/kernel/sched/fair.c	2013-02-13 20:52:37.409507417 +0100
@@ -5060,6 +5060,10 @@
 	ld_moved = 0;
 	lb_iterations = 1;
 	clock_updated = 0;
+
+	env.src_cpu   = busiest-cpu;
+	env.src_rq= busiest;
+	
 	if (busiest-nr_running  1) {
 		/*
 		 * Attempt to move tasks. If find_busiest_group has found
@@ -5068,8 +5072,6 @@
 		 * correctly treated as an imbalance.
 		 */
 		env.flags |= LBF_ALL_PINNED;
-		env.src_cpu   = busiest-cpu;
-		env.src_rq= busiest;
 		env.loop_max  = min(sysctl_sched_nr_migrate, busiest-nr_running);
 
 		update_h_load(env.src_cpu);


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-13 Thread Paul Turner
On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot
vincent.guit...@linaro.org wrote:
 need_active_balance uses env-src_cpu which is set only if there is more
 than 1 task on the run queue.
 We must set the src_cpu field unconditionnally
 otherwise the test env-src_cpu  env-dst_cpu will always fail if there is
 only 1 task on the run queue

 Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
 ---
  kernel/sched/fair.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 81fa536..32938ea 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -5044,6 +5044,10 @@ redo:


Aside:  Oh dear, it looks like in all the re-factoring here
update_h_load() has escaped rq-lock?

load_balance()
...
update_h_load(env.src_cpu);
more_balance:
local_irq_save(flags);
double_rq_lock(env.dst_rq, busiest);



 ld_moved = 0;
 lb_iterations = 1;
 +
 +   env.src_cpu   = busiest-cpu;
 +   env.src_rq= busiest;
 +

Hmm -- But shouldn't find_busiest_queue return NULL in this case?

We're admittedly racing but we should have:
  busiest-nr_running == 1
  wl = rq-load.weight  env-imbalance since imbalance  (wl -
this_rq-load.weight / 2)

This would seem specialized to the case when we either
  A) race (fbq is openly racy)
  B) have no capacity

Admittedly when we do race current case this is probably a biased
coinflip depending on whatever was on the stack (src_cpu is
uninitialized); it's also super easy for this to later become a crash
if someone tries to dereference src_rq so we should do something.

The case this seems most important for (and something we should add an
explicit comment regarding) is that we want this case specifically for
CPU_NEW_IDLE to move a single runnable-task to a lower numbered
idle-cpu index in the SD_ASYM case (although I suspect we need to push
this up to fbq also to actually find it...)

In the !SD_ASYM case we'd probably also want to  re-check busiest
nr_running in need_active_balance (or probably better alternative
re-arrange the checks) since this is going to potentially now move a
single large task needlessly to an already loaded rq in balance-failed
case.


 if (busiest-nr_running  1) {
 /*
  * Attempt to move tasks. If find_busiest_group has found
 @@ -5052,8 +5056,6 @@ redo:
  * correctly treated as an imbalance.
  */
 env.flags |= LBF_ALL_PINNED;
 -   env.src_cpu   = busiest-cpu;
 -   env.src_rq= busiest;
 env.loop_max  = min(sysctl_sched_nr_migrate, 
 busiest-nr_running);

 update_h_load(env.src_cpu);
 --
 1.7.9.5

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-12 Thread Vincent Guittot
Hi Damien,

Thanks for the test and the feedback.
Could you send me the sched_domain configuration of your machine with
the kernel that boots on your machine ?
It's available in /proc/sys/kernel/sched_domain/cpu*/

This should not have any impact on your machine but it looks like it have one.

Regards,
Vincent


On 13 February 2013 07:18, Damien Wyart  wrote:
> Hi,
>
> I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core
> i7 920) unable to boot (very early as nothing at all is displayed on
> screen). Nothing in the kernel log (after booting with a working
> kernel).
>
> Double-checked by just backing out only this patch and this made the
> machine boot again.
>
> Damien
>
>> need_active_balance uses env->src_cpu which is set only if there is more
>> than 1 task on the run queue. We must set the src_cpu field unconditionnally
>> otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is
>> only 1 task on the run queue
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/sched/fair.c |6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 81fa536..32938ea 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5044,6 +5044,10 @@ redo:
>>
>>   ld_moved = 0;
>>   lb_iterations = 1;
>> +
>> + env.src_cpu   = busiest->cpu;
>> + env.src_rq= busiest;
>> +
>>   if (busiest->nr_running > 1) {
>>   /*
>>* Attempt to move tasks. If find_busiest_group has found
>> @@ -5052,8 +5056,6 @@ redo:
>>* correctly treated as an imbalance.
>>*/
>>   env.flags |= LBF_ALL_PINNED;
>> - env.src_cpu   = busiest->cpu;
>> - env.src_rq= busiest;
>>   env.loop_max  = min(sysctl_sched_nr_migrate, 
>> busiest->nr_running);
>>
>>   update_h_load(env.src_cpu);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env->src_cpu for active migration

2013-02-12 Thread Damien Wyart
Hi,

I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core
i7 920) unable to boot (very early as nothing at all is displayed on
screen). Nothing in the kernel log (after booting with a working
kernel).

Double-checked by just backing out only this patch and this made the
machine boot again.

Damien

> need_active_balance uses env->src_cpu which is set only if there is more
> than 1 task on the run queue. We must set the src_cpu field unconditionnally
> otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is
> only 1 task on the run queue
> 
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/fair.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 81fa536..32938ea 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5044,6 +5044,10 @@ redo:
>  
>   ld_moved = 0;
>   lb_iterations = 1;
> +
> + env.src_cpu   = busiest->cpu;
> + env.src_rq= busiest;
> +
>   if (busiest->nr_running > 1) {
>   /*
>* Attempt to move tasks. If find_busiest_group has found
> @@ -5052,8 +5056,6 @@ redo:
>* correctly treated as an imbalance.
>*/
>   env.flags |= LBF_ALL_PINNED;
> - env.src_cpu   = busiest->cpu;
> - env.src_rq= busiest;
>   env.loop_max  = min(sysctl_sched_nr_migrate, 
> busiest->nr_running);
>  
>   update_h_load(env.src_cpu);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sched: fix env->src_cpu for active migration

2013-02-12 Thread Vincent Guittot
need_active_balance uses env->src_cpu which is set only if there is more
than 1 task on the run queue. We must set the src_cpu field unconditionnally
otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is
only 1 task on the run queue

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..32938ea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5044,6 +5044,10 @@ redo:
 
ld_moved = 0;
lb_iterations = 1;
+
+   env.src_cpu   = busiest->cpu;
+   env.src_rq= busiest;
+
if (busiest->nr_running > 1) {
/*
 * Attempt to move tasks. If find_busiest_group has found
@@ -5052,8 +5056,6 @@ redo:
 * correctly treated as an imbalance.
 */
env.flags |= LBF_ALL_PINNED;
-   env.src_cpu   = busiest->cpu;
-   env.src_rq= busiest;
env.loop_max  = min(sysctl_sched_nr_migrate, 
busiest->nr_running);
 
update_h_load(env.src_cpu);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sched: fix env-src_cpu for active migration

2013-02-12 Thread Vincent Guittot
need_active_balance uses env-src_cpu which is set only if there is more
than 1 task on the run queue. We must set the src_cpu field unconditionnally
otherwise the test env-src_cpu  env-dst_cpu will always fail if there is
only 1 task on the run queue

Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
---
 kernel/sched/fair.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..32938ea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5044,6 +5044,10 @@ redo:
 
ld_moved = 0;
lb_iterations = 1;
+
+   env.src_cpu   = busiest-cpu;
+   env.src_rq= busiest;
+
if (busiest-nr_running  1) {
/*
 * Attempt to move tasks. If find_busiest_group has found
@@ -5052,8 +5056,6 @@ redo:
 * correctly treated as an imbalance.
 */
env.flags |= LBF_ALL_PINNED;
-   env.src_cpu   = busiest-cpu;
-   env.src_rq= busiest;
env.loop_max  = min(sysctl_sched_nr_migrate, 
busiest-nr_running);
 
update_h_load(env.src_cpu);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-12 Thread Damien Wyart
Hi,

I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core
i7 920) unable to boot (very early as nothing at all is displayed on
screen). Nothing in the kernel log (after booting with a working
kernel).

Double-checked by just backing out only this patch and this made the
machine boot again.

Damien

 need_active_balance uses env-src_cpu which is set only if there is more
 than 1 task on the run queue. We must set the src_cpu field unconditionnally
 otherwise the test env-src_cpu  env-dst_cpu will always fail if there is
 only 1 task on the run queue
 
 Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
 ---
  kernel/sched/fair.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 81fa536..32938ea 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -5044,6 +5044,10 @@ redo:
  
   ld_moved = 0;
   lb_iterations = 1;
 +
 + env.src_cpu   = busiest-cpu;
 + env.src_rq= busiest;
 +
   if (busiest-nr_running  1) {
   /*
* Attempt to move tasks. If find_busiest_group has found
 @@ -5052,8 +5056,6 @@ redo:
* correctly treated as an imbalance.
*/
   env.flags |= LBF_ALL_PINNED;
 - env.src_cpu   = busiest-cpu;
 - env.src_rq= busiest;
   env.loop_max  = min(sysctl_sched_nr_migrate, 
 busiest-nr_running);
  
   update_h_load(env.src_cpu);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: fix env-src_cpu for active migration

2013-02-12 Thread Vincent Guittot
Hi Damien,

Thanks for the test and the feedback.
Could you send me the sched_domain configuration of your machine with
the kernel that boots on your machine ?
It's available in /proc/sys/kernel/sched_domain/cpu*/

This should not have any impact on your machine but it looks like it have one.

Regards,
Vincent


On 13 February 2013 07:18, Damien Wyart damien.wy...@gmail.com wrote:
 Hi,

 I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core
 i7 920) unable to boot (very early as nothing at all is displayed on
 screen). Nothing in the kernel log (after booting with a working
 kernel).

 Double-checked by just backing out only this patch and this made the
 machine boot again.

 Damien

 need_active_balance uses env-src_cpu which is set only if there is more
 than 1 task on the run queue. We must set the src_cpu field unconditionnally
 otherwise the test env-src_cpu  env-dst_cpu will always fail if there is
 only 1 task on the run queue

 Signed-off-by: Vincent Guittot vincent.guit...@linaro.org
 ---
  kernel/sched/fair.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index 81fa536..32938ea 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -5044,6 +5044,10 @@ redo:

   ld_moved = 0;
   lb_iterations = 1;
 +
 + env.src_cpu   = busiest-cpu;
 + env.src_rq= busiest;
 +
   if (busiest-nr_running  1) {
   /*
* Attempt to move tasks. If find_busiest_group has found
 @@ -5052,8 +5056,6 @@ redo:
* correctly treated as an imbalance.
*/
   env.flags |= LBF_ALL_PINNED;
 - env.src_cpu   = busiest-cpu;
 - env.src_rq= busiest;
   env.loop_max  = min(sysctl_sched_nr_migrate, 
 busiest-nr_running);

   update_h_load(env.src_cpu);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/