Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Preeti U Murthy
On 05/26/2014 09:24 PM, Vincent Guittot wrote:
> Hi Preeti,
> 
> I have done ebizzy tests on my platforms but doesn't have similar
> results than you (my results below). It seems to be linked to SMT. I'm
> going to look at that part more deeply and try to find a more suitable
> HW for tests.

You are right Vincent. I tested this in smt-off mode and the regression
was not seen. But the regression was of the order 27% with higher number
of threads in smt-on mode. What is interesting is that the regression
increases in the range N=1 to N=24 and then it dips to 0 at N=48 on a 6
core, SMT 8 machine. Let me dig this further.

Let me dig further.

Regards
Preeti U Murthy
> 
> ebizzy -t N -S 20
> Quad cores
>  N  tip +patchset
>  1  100.00% (+/- 0.30%)  97.00% (+/- 0.42%)
>  2  100.00% (+/- 0.80%) 100.48% (+/- 0.88%)
>  4  100.00% (+/- 1.18%)  99.32% (+/- 1.05%)
>  6  100.00% (+/- 8.54%)  98.84% (+/- 1.39%)
>  8  100.00% (+/- 0.45%)  98.89% (+/- 0.91%)
> 10  100.00% (+/- 0.32%)  99.25% (+/- 0.31%)
> 12  100.00% (+/- 0.15%)  99.20% (+/- 0.86%)
> 14  100.00% (+/- 0.58%)  99.44% (+/- 0.55%)
> 
> Dual cores
>  N  tip +patchset
>  1  100.00% (+/- 1.70%)  99.35% (+/- 2.82%)
>  2  100.00% (+/- 2.75%) 100.48% (+/- 1.51%)
>  4  100.00% (+/- 2.37%) 102.63% (+/- 2.35%)
>  6  100.00% (+/- 3.11%)  97.65% (+/- 1.02%)
>  8  100.00% (+/- 0.26%) 103.68% (+/- 5.90%)
> 10  100.00% (+/- 0.30%) 106.71% (+/- 10.85%)
> 12  100.00% (+/- 1.18%)  98.95% (+/- 0.75%)
> 14  100.00% (+/- 1.82%) 102.89% (+/- 2.32%)
> 
> Regards,
> Vincent
> 
> On 26 May 2014 12:04, Vincent Guittot  wrote:
>> On 26 May 2014 11:44, Preeti U Murthy  wrote:
>>> Hi Vincent,
>>>
>>> I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
>>> 6 cores with SMT-8 to be precise. Its a single socket box. The results
>>> are as below.
>>>
>>> On 05/23/2014 09:22 PM, Vincent Guittot wrote:
 Part of this patchset was previously part of the larger tasks packing 
 patchset
 [1]. I have splitted the latter in 3 different patchsets (at least) to 
 make the
 thing easier.
 -configuration of sched_domain topology [2]
 -update and consolidation of cpu_power (this patchset)
 -tasks packing algorithm

 SMT system is no more the only system that can have a CPUs with an original
 capacity that is different from the default value. We need to extend the 
 use of
 cpu_power_orig to all kind of platform so the scheduler will have both the
 maximum capacity (cpu_power_orig/power_orig) and the current capacity
 (cpu_power/power) of CPUs and sched_groups. A new function 
 arch_scale_cpu_power
 has been created and replace arch_scale_smt_power, which is SMT specifc in 
 the
 computation of the capapcity of a CPU.

 During load balance, the scheduler evaluates the number of tasks that a 
 group
 of CPUs can handle. The current method assumes that tasks have a fix load 
 of
 SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
 This assumption generates wrong decision by creating ghost cores and by
 removing real ones when the original capacity of CPUs is different from the
 default SCHED_POWER_SCALE.

 Now that we have the original capacity of a CPUS and its 
 activity/utilization,
 we can evaluate more accuratly the capacity of a group of CPUs.

 This patchset mainly replaces the old capacity method by a new one and has 
 kept
 the policy almost unchanged whereas we can certainly take advantage of 
 this new
 statistic in several other places of the load balance.

 TODO:
  - align variable's and field's name with the renaming [3]

 Tests results:
 I have put below results of 2 tests:
 - hackbench -l 500 -s 4096
 - scp of 100MB file on the platform

 on a dual cortex-A7
   hackbenchscp
 tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
 + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
 + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
 + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)

 on a quad cortex-A15
   hackbenchscp
 tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
 + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
 + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
 + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)

 The improvement of scp bandwidth happens when tasks and irq are using
 different CPU which is a bit random without irq accounting config
>>>
>>> N -> Number of threads of ebizzy
>>>
>>> Each 'N' run was for 30 seconds with multiple iterations and averaging them.
>>>
>>> N  %change in number of records
>>>read after patching
>>> --
>>> 1  + 0.0038
>>> 4  -17.6429
>>> 8  -26.3989
>>> 12 -29.5070
>>> 16

Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Vincent Guittot
Hi Preeti,

I have done ebizzy tests on my platforms but doesn't have similar
results than you (my results below). It seems to be linked to SMT. I'm
going to look at that part more deeply and try to find a more suitable
HW for tests.

ebizzy -t N -S 20
Quad cores
 N  tip +patchset
 1  100.00% (+/- 0.30%)  97.00% (+/- 0.42%)
 2  100.00% (+/- 0.80%) 100.48% (+/- 0.88%)
 4  100.00% (+/- 1.18%)  99.32% (+/- 1.05%)
 6  100.00% (+/- 8.54%)  98.84% (+/- 1.39%)
 8  100.00% (+/- 0.45%)  98.89% (+/- 0.91%)
10  100.00% (+/- 0.32%)  99.25% (+/- 0.31%)
12  100.00% (+/- 0.15%)  99.20% (+/- 0.86%)
14  100.00% (+/- 0.58%)  99.44% (+/- 0.55%)

Dual cores
 N  tip +patchset
 1  100.00% (+/- 1.70%)  99.35% (+/- 2.82%)
 2  100.00% (+/- 2.75%) 100.48% (+/- 1.51%)
 4  100.00% (+/- 2.37%) 102.63% (+/- 2.35%)
 6  100.00% (+/- 3.11%)  97.65% (+/- 1.02%)
 8  100.00% (+/- 0.26%) 103.68% (+/- 5.90%)
10  100.00% (+/- 0.30%) 106.71% (+/- 10.85%)
12  100.00% (+/- 1.18%)  98.95% (+/- 0.75%)
14  100.00% (+/- 1.82%) 102.89% (+/- 2.32%)

Regards,
Vincent

On 26 May 2014 12:04, Vincent Guittot  wrote:
> On 26 May 2014 11:44, Preeti U Murthy  wrote:
>> Hi Vincent,
>>
>> I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
>> 6 cores with SMT-8 to be precise. Its a single socket box. The results
>> are as below.
>>
>> On 05/23/2014 09:22 PM, Vincent Guittot wrote:
>>> Part of this patchset was previously part of the larger tasks packing 
>>> patchset
>>> [1]. I have splitted the latter in 3 different patchsets (at least) to make 
>>> the
>>> thing easier.
>>> -configuration of sched_domain topology [2]
>>> -update and consolidation of cpu_power (this patchset)
>>> -tasks packing algorithm
>>>
>>> SMT system is no more the only system that can have a CPUs with an original
>>> capacity that is different from the default value. We need to extend the 
>>> use of
>>> cpu_power_orig to all kind of platform so the scheduler will have both the
>>> maximum capacity (cpu_power_orig/power_orig) and the current capacity
>>> (cpu_power/power) of CPUs and sched_groups. A new function 
>>> arch_scale_cpu_power
>>> has been created and replace arch_scale_smt_power, which is SMT specifc in 
>>> the
>>> computation of the capapcity of a CPU.
>>>
>>> During load balance, the scheduler evaluates the number of tasks that a 
>>> group
>>> of CPUs can handle. The current method assumes that tasks have a fix load of
>>> SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
>>> This assumption generates wrong decision by creating ghost cores and by
>>> removing real ones when the original capacity of CPUs is different from the
>>> default SCHED_POWER_SCALE.
>>>
>>> Now that we have the original capacity of a CPUS and its 
>>> activity/utilization,
>>> we can evaluate more accuratly the capacity of a group of CPUs.
>>>
>>> This patchset mainly replaces the old capacity method by a new one and has 
>>> kept
>>> the policy almost unchanged whereas we can certainly take advantage of this 
>>> new
>>> statistic in several other places of the load balance.
>>>
>>> TODO:
>>>  - align variable's and field's name with the renaming [3]
>>>
>>> Tests results:
>>> I have put below results of 2 tests:
>>> - hackbench -l 500 -s 4096
>>> - scp of 100MB file on the platform
>>>
>>> on a dual cortex-A7
>>>   hackbenchscp
>>> tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
>>> + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
>>> + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
>>> + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)
>>>
>>> on a quad cortex-A15
>>>   hackbenchscp
>>> tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
>>> + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
>>> + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
>>> + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)
>>>
>>> The improvement of scp bandwidth happens when tasks and irq are using
>>> different CPU which is a bit random without irq accounting config
>>
>> N -> Number of threads of ebizzy
>>
>> Each 'N' run was for 30 seconds with multiple iterations and averaging them.
>>
>> N  %change in number of records
>>read after patching
>> --
>> 1  + 0.0038
>> 4  -17.6429
>> 8  -26.3989
>> 12 -29.5070
>> 16 -38.4842
>> 20 -44.5747
>> 24 -51.9792
>> 28 -34.1863
>> 32 -38.4029
>> 38 -22.2490
>> 42  -7.4843
>> 47 -0.69676
>>
>> Let me profile it and check where the cause of this degradation is.
>
> Hi Preeti,
>
> Thanks for the test and the help to find the root cause of the
> degration. I'm going to run the test on my platforms too and see if i
> have similar results with my platforms
>
> Regards
> Vincent
>>
>>
>> Regards
>> Preeti U Murthy
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Vincent Guittot
On 26 May 2014 11:44, Preeti U Murthy  wrote:
> Hi Vincent,
>
> I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
> 6 cores with SMT-8 to be precise. Its a single socket box. The results
> are as below.
>
> On 05/23/2014 09:22 PM, Vincent Guittot wrote:
>> Part of this patchset was previously part of the larger tasks packing 
>> patchset
>> [1]. I have splitted the latter in 3 different patchsets (at least) to make 
>> the
>> thing easier.
>> -configuration of sched_domain topology [2]
>> -update and consolidation of cpu_power (this patchset)
>> -tasks packing algorithm
>>
>> SMT system is no more the only system that can have a CPUs with an original
>> capacity that is different from the default value. We need to extend the use 
>> of
>> cpu_power_orig to all kind of platform so the scheduler will have both the
>> maximum capacity (cpu_power_orig/power_orig) and the current capacity
>> (cpu_power/power) of CPUs and sched_groups. A new function 
>> arch_scale_cpu_power
>> has been created and replace arch_scale_smt_power, which is SMT specifc in 
>> the
>> computation of the capapcity of a CPU.
>>
>> During load balance, the scheduler evaluates the number of tasks that a group
>> of CPUs can handle. The current method assumes that tasks have a fix load of
>> SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
>> This assumption generates wrong decision by creating ghost cores and by
>> removing real ones when the original capacity of CPUs is different from the
>> default SCHED_POWER_SCALE.
>>
>> Now that we have the original capacity of a CPUS and its 
>> activity/utilization,
>> we can evaluate more accuratly the capacity of a group of CPUs.
>>
>> This patchset mainly replaces the old capacity method by a new one and has 
>> kept
>> the policy almost unchanged whereas we can certainly take advantage of this 
>> new
>> statistic in several other places of the load balance.
>>
>> TODO:
>>  - align variable's and field's name with the renaming [3]
>>
>> Tests results:
>> I have put below results of 2 tests:
>> - hackbench -l 500 -s 4096
>> - scp of 100MB file on the platform
>>
>> on a dual cortex-A7
>>   hackbenchscp
>> tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
>> + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
>> + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
>> + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)
>>
>> on a quad cortex-A15
>>   hackbenchscp
>> tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
>> + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
>> + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
>> + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)
>>
>> The improvement of scp bandwidth happens when tasks and irq are using
>> different CPU which is a bit random without irq accounting config
>
> N -> Number of threads of ebizzy
>
> Each 'N' run was for 30 seconds with multiple iterations and averaging them.
>
> N  %change in number of records
>read after patching
> --
> 1  + 0.0038
> 4  -17.6429
> 8  -26.3989
> 12 -29.5070
> 16 -38.4842
> 20 -44.5747
> 24 -51.9792
> 28 -34.1863
> 32 -38.4029
> 38 -22.2490
> 42  -7.4843
> 47 -0.69676
>
> Let me profile it and check where the cause of this degradation is.

Hi Preeti,

Thanks for the test and the help to find the root cause of the
degration. I'm going to run the test on my platforms too and see if i
have similar results with my platforms

Regards
Vincent
>
>
> Regards
> Preeti U Murthy
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Preeti U Murthy
Hi Vincent,

I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
6 cores with SMT-8 to be precise. Its a single socket box. The results
are as below.

On 05/23/2014 09:22 PM, Vincent Guittot wrote:
> Part of this patchset was previously part of the larger tasks packing patchset
> [1]. I have splitted the latter in 3 different patchsets (at least) to make 
> the
> thing easier.
> -configuration of sched_domain topology [2]
> -update and consolidation of cpu_power (this patchset)
> -tasks packing algorithm
> 
> SMT system is no more the only system that can have a CPUs with an original
> capacity that is different from the default value. We need to extend the use 
> of
> cpu_power_orig to all kind of platform so the scheduler will have both the
> maximum capacity (cpu_power_orig/power_orig) and the current capacity
> (cpu_power/power) of CPUs and sched_groups. A new function 
> arch_scale_cpu_power
> has been created and replace arch_scale_smt_power, which is SMT specifc in the
> computation of the capapcity of a CPU.
> 
> During load balance, the scheduler evaluates the number of tasks that a group
> of CPUs can handle. The current method assumes that tasks have a fix load of 
> SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
> This assumption generates wrong decision by creating ghost cores and by
> removing real ones when the original capacity of CPUs is different from the
> default SCHED_POWER_SCALE.
> 
> Now that we have the original capacity of a CPUS and its activity/utilization,
> we can evaluate more accuratly the capacity of a group of CPUs.
> 
> This patchset mainly replaces the old capacity method by a new one and has 
> kept
> the policy almost unchanged whereas we can certainly take advantage of this 
> new
> statistic in several other places of the load balance.
> 
> TODO:
>  - align variable's and field's name with the renaming [3]
> 
> Tests results:
> I have put below results of 2 tests:
> - hackbench -l 500 -s 4096
> - scp of 100MB file on the platform
> 
> on a dual cortex-A7 
>   hackbenchscp
> tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
> + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
> + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
> + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)
> 
> on a quad cortex-A15 
>   hackbenchscp
> tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
> + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
> + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
> + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)
> 
> The improvement of scp bandwidth happens when tasks and irq are using
> different CPU which is a bit random without irq accounting config

N -> Number of threads of ebizzy

Each 'N' run was for 30 seconds with multiple iterations and averaging them.

N  %change in number of records
   read after patching
--
1  + 0.0038
4  -17.6429
8  -26.3989
12 -29.5070
16 -38.4842
20 -44.5747
24 -51.9792
28 -34.1863
32 -38.4029
38 -22.2490
42  -7.4843
47 -0.69676

Let me profile it and check where the cause of this degradation is.


Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Preeti U Murthy
Hi Vincent,

I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
6 cores with SMT-8 to be precise. Its a single socket box. The results
are as below.

On 05/23/2014 09:22 PM, Vincent Guittot wrote:
 Part of this patchset was previously part of the larger tasks packing patchset
 [1]. I have splitted the latter in 3 different patchsets (at least) to make 
 the
 thing easier.
 -configuration of sched_domain topology [2]
 -update and consolidation of cpu_power (this patchset)
 -tasks packing algorithm
 
 SMT system is no more the only system that can have a CPUs with an original
 capacity that is different from the default value. We need to extend the use 
 of
 cpu_power_orig to all kind of platform so the scheduler will have both the
 maximum capacity (cpu_power_orig/power_orig) and the current capacity
 (cpu_power/power) of CPUs and sched_groups. A new function 
 arch_scale_cpu_power
 has been created and replace arch_scale_smt_power, which is SMT specifc in the
 computation of the capapcity of a CPU.
 
 During load balance, the scheduler evaluates the number of tasks that a group
 of CPUs can handle. The current method assumes that tasks have a fix load of 
 SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
 This assumption generates wrong decision by creating ghost cores and by
 removing real ones when the original capacity of CPUs is different from the
 default SCHED_POWER_SCALE.
 
 Now that we have the original capacity of a CPUS and its activity/utilization,
 we can evaluate more accuratly the capacity of a group of CPUs.
 
 This patchset mainly replaces the old capacity method by a new one and has 
 kept
 the policy almost unchanged whereas we can certainly take advantage of this 
 new
 statistic in several other places of the load balance.
 
 TODO:
  - align variable's and field's name with the renaming [3]
 
 Tests results:
 I have put below results of 2 tests:
 - hackbench -l 500 -s 4096
 - scp of 100MB file on the platform
 
 on a dual cortex-A7 
   hackbenchscp
 tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
 + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
 + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
 + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)
 
 on a quad cortex-A15 
   hackbenchscp
 tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
 + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
 + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
 + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)
 
 The improvement of scp bandwidth happens when tasks and irq are using
 different CPU which is a bit random without irq accounting config

N - Number of threads of ebizzy

Each 'N' run was for 30 seconds with multiple iterations and averaging them.

N  %change in number of records
   read after patching
--
1  + 0.0038
4  -17.6429
8  -26.3989
12 -29.5070
16 -38.4842
20 -44.5747
24 -51.9792
28 -34.1863
32 -38.4029
38 -22.2490
42  -7.4843
47 -0.69676

Let me profile it and check where the cause of this degradation is.


Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Vincent Guittot
On 26 May 2014 11:44, Preeti U Murthy pre...@linux.vnet.ibm.com wrote:
 Hi Vincent,

 I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
 6 cores with SMT-8 to be precise. Its a single socket box. The results
 are as below.

 On 05/23/2014 09:22 PM, Vincent Guittot wrote:
 Part of this patchset was previously part of the larger tasks packing 
 patchset
 [1]. I have splitted the latter in 3 different patchsets (at least) to make 
 the
 thing easier.
 -configuration of sched_domain topology [2]
 -update and consolidation of cpu_power (this patchset)
 -tasks packing algorithm

 SMT system is no more the only system that can have a CPUs with an original
 capacity that is different from the default value. We need to extend the use 
 of
 cpu_power_orig to all kind of platform so the scheduler will have both the
 maximum capacity (cpu_power_orig/power_orig) and the current capacity
 (cpu_power/power) of CPUs and sched_groups. A new function 
 arch_scale_cpu_power
 has been created and replace arch_scale_smt_power, which is SMT specifc in 
 the
 computation of the capapcity of a CPU.

 During load balance, the scheduler evaluates the number of tasks that a group
 of CPUs can handle. The current method assumes that tasks have a fix load of
 SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
 This assumption generates wrong decision by creating ghost cores and by
 removing real ones when the original capacity of CPUs is different from the
 default SCHED_POWER_SCALE.

 Now that we have the original capacity of a CPUS and its 
 activity/utilization,
 we can evaluate more accuratly the capacity of a group of CPUs.

 This patchset mainly replaces the old capacity method by a new one and has 
 kept
 the policy almost unchanged whereas we can certainly take advantage of this 
 new
 statistic in several other places of the load balance.

 TODO:
  - align variable's and field's name with the renaming [3]

 Tests results:
 I have put below results of 2 tests:
 - hackbench -l 500 -s 4096
 - scp of 100MB file on the platform

 on a dual cortex-A7
   hackbenchscp
 tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
 + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
 + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
 + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)

 on a quad cortex-A15
   hackbenchscp
 tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
 + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
 + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
 + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)

 The improvement of scp bandwidth happens when tasks and irq are using
 different CPU which is a bit random without irq accounting config

 N - Number of threads of ebizzy

 Each 'N' run was for 30 seconds with multiple iterations and averaging them.

 N  %change in number of records
read after patching
 --
 1  + 0.0038
 4  -17.6429
 8  -26.3989
 12 -29.5070
 16 -38.4842
 20 -44.5747
 24 -51.9792
 28 -34.1863
 32 -38.4029
 38 -22.2490
 42  -7.4843
 47 -0.69676

 Let me profile it and check where the cause of this degradation is.

Hi Preeti,

Thanks for the test and the help to find the root cause of the
degration. I'm going to run the test on my platforms too and see if i
have similar results with my platforms

Regards
Vincent


 Regards
 Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Vincent Guittot
Hi Preeti,

I have done ebizzy tests on my platforms but doesn't have similar
results than you (my results below). It seems to be linked to SMT. I'm
going to look at that part more deeply and try to find a more suitable
HW for tests.

ebizzy -t N -S 20
Quad cores
 N  tip +patchset
 1  100.00% (+/- 0.30%)  97.00% (+/- 0.42%)
 2  100.00% (+/- 0.80%) 100.48% (+/- 0.88%)
 4  100.00% (+/- 1.18%)  99.32% (+/- 1.05%)
 6  100.00% (+/- 8.54%)  98.84% (+/- 1.39%)
 8  100.00% (+/- 0.45%)  98.89% (+/- 0.91%)
10  100.00% (+/- 0.32%)  99.25% (+/- 0.31%)
12  100.00% (+/- 0.15%)  99.20% (+/- 0.86%)
14  100.00% (+/- 0.58%)  99.44% (+/- 0.55%)

Dual cores
 N  tip +patchset
 1  100.00% (+/- 1.70%)  99.35% (+/- 2.82%)
 2  100.00% (+/- 2.75%) 100.48% (+/- 1.51%)
 4  100.00% (+/- 2.37%) 102.63% (+/- 2.35%)
 6  100.00% (+/- 3.11%)  97.65% (+/- 1.02%)
 8  100.00% (+/- 0.26%) 103.68% (+/- 5.90%)
10  100.00% (+/- 0.30%) 106.71% (+/- 10.85%)
12  100.00% (+/- 1.18%)  98.95% (+/- 0.75%)
14  100.00% (+/- 1.82%) 102.89% (+/- 2.32%)

Regards,
Vincent

On 26 May 2014 12:04, Vincent Guittot vincent.guit...@linaro.org wrote:
 On 26 May 2014 11:44, Preeti U Murthy pre...@linux.vnet.ibm.com wrote:
 Hi Vincent,

 I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
 6 cores with SMT-8 to be precise. Its a single socket box. The results
 are as below.

 On 05/23/2014 09:22 PM, Vincent Guittot wrote:
 Part of this patchset was previously part of the larger tasks packing 
 patchset
 [1]. I have splitted the latter in 3 different patchsets (at least) to make 
 the
 thing easier.
 -configuration of sched_domain topology [2]
 -update and consolidation of cpu_power (this patchset)
 -tasks packing algorithm

 SMT system is no more the only system that can have a CPUs with an original
 capacity that is different from the default value. We need to extend the 
 use of
 cpu_power_orig to all kind of platform so the scheduler will have both the
 maximum capacity (cpu_power_orig/power_orig) and the current capacity
 (cpu_power/power) of CPUs and sched_groups. A new function 
 arch_scale_cpu_power
 has been created and replace arch_scale_smt_power, which is SMT specifc in 
 the
 computation of the capapcity of a CPU.

 During load balance, the scheduler evaluates the number of tasks that a 
 group
 of CPUs can handle. The current method assumes that tasks have a fix load of
 SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
 This assumption generates wrong decision by creating ghost cores and by
 removing real ones when the original capacity of CPUs is different from the
 default SCHED_POWER_SCALE.

 Now that we have the original capacity of a CPUS and its 
 activity/utilization,
 we can evaluate more accuratly the capacity of a group of CPUs.

 This patchset mainly replaces the old capacity method by a new one and has 
 kept
 the policy almost unchanged whereas we can certainly take advantage of this 
 new
 statistic in several other places of the load balance.

 TODO:
  - align variable's and field's name with the renaming [3]

 Tests results:
 I have put below results of 2 tests:
 - hackbench -l 500 -s 4096
 - scp of 100MB file on the platform

 on a dual cortex-A7
   hackbenchscp
 tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
 + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
 + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
 + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)

 on a quad cortex-A15
   hackbenchscp
 tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
 + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
 + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
 + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)

 The improvement of scp bandwidth happens when tasks and irq are using
 different CPU which is a bit random without irq accounting config

 N - Number of threads of ebizzy

 Each 'N' run was for 30 seconds with multiple iterations and averaging them.

 N  %change in number of records
read after patching
 --
 1  + 0.0038
 4  -17.6429
 8  -26.3989
 12 -29.5070
 16 -38.4842
 20 -44.5747
 24 -51.9792
 28 -34.1863
 32 -38.4029
 38 -22.2490
 42  -7.4843
 47 -0.69676

 Let me profile it and check where the cause of this degradation is.

 Hi Preeti,

 Thanks for the test and the help to find the root cause of the
 degration. I'm going to run the test on my platforms too and see if i
 have similar results with my platforms

 Regards
 Vincent


 Regards
 Preeti U Murthy

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-26 Thread Preeti U Murthy
On 05/26/2014 09:24 PM, Vincent Guittot wrote:
 Hi Preeti,
 
 I have done ebizzy tests on my platforms but doesn't have similar
 results than you (my results below). It seems to be linked to SMT. I'm
 going to look at that part more deeply and try to find a more suitable
 HW for tests.

You are right Vincent. I tested this in smt-off mode and the regression
was not seen. But the regression was of the order 27% with higher number
of threads in smt-on mode. What is interesting is that the regression
increases in the range N=1 to N=24 and then it dips to 0 at N=48 on a 6
core, SMT 8 machine. Let me dig this further.

Let me dig further.

Regards
Preeti U Murthy
 
 ebizzy -t N -S 20
 Quad cores
  N  tip +patchset
  1  100.00% (+/- 0.30%)  97.00% (+/- 0.42%)
  2  100.00% (+/- 0.80%) 100.48% (+/- 0.88%)
  4  100.00% (+/- 1.18%)  99.32% (+/- 1.05%)
  6  100.00% (+/- 8.54%)  98.84% (+/- 1.39%)
  8  100.00% (+/- 0.45%)  98.89% (+/- 0.91%)
 10  100.00% (+/- 0.32%)  99.25% (+/- 0.31%)
 12  100.00% (+/- 0.15%)  99.20% (+/- 0.86%)
 14  100.00% (+/- 0.58%)  99.44% (+/- 0.55%)
 
 Dual cores
  N  tip +patchset
  1  100.00% (+/- 1.70%)  99.35% (+/- 2.82%)
  2  100.00% (+/- 2.75%) 100.48% (+/- 1.51%)
  4  100.00% (+/- 2.37%) 102.63% (+/- 2.35%)
  6  100.00% (+/- 3.11%)  97.65% (+/- 1.02%)
  8  100.00% (+/- 0.26%) 103.68% (+/- 5.90%)
 10  100.00% (+/- 0.30%) 106.71% (+/- 10.85%)
 12  100.00% (+/- 1.18%)  98.95% (+/- 0.75%)
 14  100.00% (+/- 1.82%) 102.89% (+/- 2.32%)
 
 Regards,
 Vincent
 
 On 26 May 2014 12:04, Vincent Guittot vincent.guit...@linaro.org wrote:
 On 26 May 2014 11:44, Preeti U Murthy pre...@linux.vnet.ibm.com wrote:
 Hi Vincent,

 I conducted test runs of ebizzy on a Power8 box which had 48 cpus.
 6 cores with SMT-8 to be precise. Its a single socket box. The results
 are as below.

 On 05/23/2014 09:22 PM, Vincent Guittot wrote:
 Part of this patchset was previously part of the larger tasks packing 
 patchset
 [1]. I have splitted the latter in 3 different patchsets (at least) to 
 make the
 thing easier.
 -configuration of sched_domain topology [2]
 -update and consolidation of cpu_power (this patchset)
 -tasks packing algorithm

 SMT system is no more the only system that can have a CPUs with an original
 capacity that is different from the default value. We need to extend the 
 use of
 cpu_power_orig to all kind of platform so the scheduler will have both the
 maximum capacity (cpu_power_orig/power_orig) and the current capacity
 (cpu_power/power) of CPUs and sched_groups. A new function 
 arch_scale_cpu_power
 has been created and replace arch_scale_smt_power, which is SMT specifc in 
 the
 computation of the capapcity of a CPU.

 During load balance, the scheduler evaluates the number of tasks that a 
 group
 of CPUs can handle. The current method assumes that tasks have a fix load 
 of
 SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
 This assumption generates wrong decision by creating ghost cores and by
 removing real ones when the original capacity of CPUs is different from the
 default SCHED_POWER_SCALE.

 Now that we have the original capacity of a CPUS and its 
 activity/utilization,
 we can evaluate more accuratly the capacity of a group of CPUs.

 This patchset mainly replaces the old capacity method by a new one and has 
 kept
 the policy almost unchanged whereas we can certainly take advantage of 
 this new
 statistic in several other places of the load balance.

 TODO:
  - align variable's and field's name with the renaming [3]

 Tests results:
 I have put below results of 2 tests:
 - hackbench -l 500 -s 4096
 - scp of 100MB file on the platform

 on a dual cortex-A7
   hackbenchscp
 tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
 + patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
 + patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
 + irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)

 on a quad cortex-A15
   hackbenchscp
 tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
 + patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
 + patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
 + irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)

 The improvement of scp bandwidth happens when tasks and irq are using
 different CPU which is a bit random without irq accounting config

 N - Number of threads of ebizzy

 Each 'N' run was for 30 seconds with multiple iterations and averaging them.

 N  %change in number of records
read after patching
 --
 1  + 0.0038
 4  -17.6429
 8  -26.3989
 12 -29.5070
 16 -38.4842
 20 -44.5747
 24 -51.9792
 28 -34.1863
 32 -38.4029
 38 -22.2490
 42  -7.4843
 47 -0.69676

 Let me profile it and check where the cause of this degradation is.

 Hi Preeti,

 Thanks for the test and the help to find the root 

[PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-23 Thread Vincent Guittot
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm

SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and sched_groups. A new function arch_scale_cpu_power
has been created and replace arch_scale_smt_power, which is SMT specifc in the
computation of the capapcity of a CPU.

During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of 
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_POWER_SCALE.

Now that we have the original capacity of a CPUS and its activity/utilization,
we can evaluate more accuratly the capacity of a group of CPUs.

This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we can certainly take advantage of this new
statistic in several other places of the load balance.

TODO:
 - align variable's and field's name with the renaming [3]

Tests results:
I have put below results of 2 tests:
- hackbench -l 500 -s 4096
- scp of 100MB file on the platform

on a dual cortex-A7 
  hackbenchscp
tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
+ patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
+ patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
+ irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)

on a quad cortex-A15 
  hackbenchscp
tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
+ patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
+ patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
+ irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)

The improvement of scp bandwidth happens when tasks and irq are using
different CPU which is a bit random without irq accounting config

Change since V1:
 - add 3 fixes
 - correct some commit messages
 - replace capacity computation by activity
 - take into account current cpu capacity

[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/5/14/622

Vincent Guittot (11):
  sched: fix imbalance flag reset
  sched: remove a wake_affine condition
  sched: fix avg_load computation
  sched: Allow all archs to set the power_orig
  ARM: topology: use new cpu_power interface
  sched: add per rq cpu_power_orig
  Revert "sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED"
  sched: get CPU's activity statistic
  sched: test the cpu's capacity in wake affine
  sched: move cfs task on a CPU with higher capacity
  sched: replace capacity by activity

 arch/arm/kernel/topology.c |   4 +-
 kernel/sched/core.c|   2 +-
 kernel/sched/fair.c| 229 ++---
 kernel/sched/sched.h   |   5 +-
 4 files changed, 118 insertions(+), 122 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 00/11] sched: consolidation of cpu_power

2014-05-23 Thread Vincent Guittot
Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm

SMT system is no more the only system that can have a CPUs with an original
capacity that is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and sched_groups. A new function arch_scale_cpu_power
has been created and replace arch_scale_smt_power, which is SMT specifc in the
computation of the capapcity of a CPU.

During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method assumes that tasks have a fix load of 
SCHED_LOAD_SCALE and CPUs have a default capacity of SCHED_POWER_SCALE.
This assumption generates wrong decision by creating ghost cores and by
removing real ones when the original capacity of CPUs is different from the
default SCHED_POWER_SCALE.

Now that we have the original capacity of a CPUS and its activity/utilization,
we can evaluate more accuratly the capacity of a group of CPUs.

This patchset mainly replaces the old capacity method by a new one and has kept
the policy almost unchanged whereas we can certainly take advantage of this new
statistic in several other places of the load balance.

TODO:
 - align variable's and field's name with the renaming [3]

Tests results:
I have put below results of 2 tests:
- hackbench -l 500 -s 4096
- scp of 100MB file on the platform

on a dual cortex-A7 
  hackbenchscp
tip/master25.75s(+/-0.25)  5.16MB/s(+/-1.49)
+ patches 1,2 25.89s(+/-0.31)  5.18MB/s(+/-1.45)
+ patches 3-1025.68s(+/-0.22)  7.00MB/s(+/-1.88)
+ irq accounting  25.80s(+/-0.25)  8.06MB/s(+/-0.05)

on a quad cortex-A15 
  hackbenchscp
tip/master15.69s(+/-0.16)  9.70MB/s(+/-0.04)
+ patches 1,2 15.53s(+/-0.13)  9.72MB/s(+/-0.05)
+ patches 3-1015.56s(+/-0.22)  9.88MB/s(+/-0.05)
+ irq accounting  15.99s(+/-0.08) 10.37MB/s(+/-0.03)

The improvement of scp bandwidth happens when tasks and irq are using
different CPU which is a bit random without irq accounting config

Change since V1:
 - add 3 fixes
 - correct some commit messages
 - replace capacity computation by activity
 - take into account current cpu capacity

[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377
[3] https://lkml.org/lkml/2014/5/14/622

Vincent Guittot (11):
  sched: fix imbalance flag reset
  sched: remove a wake_affine condition
  sched: fix avg_load computation
  sched: Allow all archs to set the power_orig
  ARM: topology: use new cpu_power interface
  sched: add per rq cpu_power_orig
  Revert sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED
  sched: get CPU's activity statistic
  sched: test the cpu's capacity in wake affine
  sched: move cfs task on a CPU with higher capacity
  sched: replace capacity by activity

 arch/arm/kernel/topology.c |   4 +-
 kernel/sched/core.c|   2 +-
 kernel/sched/fair.c| 229 ++---
 kernel/sched/sched.h   |   5 +-
 4 files changed, 118 insertions(+), 122 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/