Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-12-18 Thread Raghavendra K T

[I forgot to do TO to Ingo last time]

Ingo,
 Could you please take this into x86 tree.
This is
Acked-by: Andrew Jones drjo...@redhat.com
Tested-by: Chegu Vinod chegu_vi...@hp.com

Marcelo, do you want to add your Acked-by/Reviewed-by?

On 12/14/2012 09:10 PM, Raghavendra K T wrote:

Hi Ingo,

Could you please take this into x86 tree?

Thanks,
On 12/14/2012 05:59 AM, Marcelo Tosatti wrote:

Raghavendra,

Please get this integrate through x86 tree (Ingo CC'ed).

On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:

From: Peter Zijlstra pet...@infradead.org

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple
handler upon
  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler
core.

Signed-off-by: Peter Zijlstra pet...@infradead.org
Raghavendra, Checking the rq length of target vcpu condition
added.(thanks Avi)
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ *true (0) if we indeed boosted the target task.
+ *false (0) if we failed to boost the target.
+ *-ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p,
bool preempt)

  again:
  p_rq = task_rq(p);
+/*
+ * If we're the only runnable task on the rq and target rq also
+ * has only one task, there's absolutely no point in yielding.
+ */
+if (rq-nr_running == 1  p_rq-nr_running == 1) {
+yielded = -ESRCH;
+goto out_irq;
+}
+
  double_rq_lock(rq, p_rq);
  while (task_rq(p) != p_rq) {
  double_rq_unlock(rq, p_rq);
@@ -4310,13 +4322,13 @@ again:
  }

  if (!curr-sched_class-yield_to_task)
-goto out;
+goto out_unlock;

  if (curr-sched_class != p-sched_class)
-goto out;
+goto out_unlock;

  if (task_running(p_rq, p) || p-state)
-goto out;
+goto out_unlock;

  yielded = curr-sched_class-yield_to_task(rq, p, preempt);
  if (yielded) {
@@ -4329,11 +4341,12 @@ again:
  resched_task(p_rq-curr);
  }

-out:
+out_unlock:
  double_rq_unlock(rq, p_rq);
+out_irq:
  local_irq_restore(flags);

-if (yielded)
+if (yielded  0)
  schedule();

  return yielded;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html








--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-12-14 Thread Marcelo Tosatti
Raghavendra,

Please get this integrate through x86 tree (Ingo CC'ed).

On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:
 From: Peter Zijlstra pet...@infradead.org
 
 In case of undercomitted scenarios, especially in large guests
 yield_to overhead is significantly high. when run queue length of
 source and target is one, take an opportunity to bail out and return
 -ESRCH. This return condition can be further exploited to quickly come
 out of PLE handler.
 
 (History: Raghavendra initially worked on break out of kvm ple handler upon
  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler core.
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Raghavendra, Checking the rq length of target vcpu condition added.(thanks 
 Avi)
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 
  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)
 
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 2d8927f..fc219a5 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
 - * Returns true if we indeed boosted the target task.
 + * Returns:
 + *   true (0) if we indeed boosted the target task.
 + *   false (0) if we failed to boost the target.
 + *   -ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
 @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
 preempt)
  
  again:
   p_rq = task_rq(p);
 + /*
 +  * If we're the only runnable task on the rq and target rq also
 +  * has only one task, there's absolutely no point in yielding.
 +  */
 + if (rq-nr_running == 1  p_rq-nr_running == 1) {
 + yielded = -ESRCH;
 + goto out_irq;
 + }
 +
   double_rq_lock(rq, p_rq);
   while (task_rq(p) != p_rq) {
   double_rq_unlock(rq, p_rq);
 @@ -4310,13 +4322,13 @@ again:
   }
  
   if (!curr-sched_class-yield_to_task)
 - goto out;
 + goto out_unlock;
  
   if (curr-sched_class != p-sched_class)
 - goto out;
 + goto out_unlock;
  
   if (task_running(p_rq, p) || p-state)
 - goto out;
 + goto out_unlock;
  
   yielded = curr-sched_class-yield_to_task(rq, p, preempt);
   if (yielded) {
 @@ -4329,11 +4341,12 @@ again:
   resched_task(p_rq-curr);
   }
  
 -out:
 +out_unlock:
   double_rq_unlock(rq, p_rq);
 +out_irq:
   local_irq_restore(flags);
  
 - if (yielded)
 + if (yielded  0)
   schedule();
  
   return yielded;
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-28 Thread Andrew Theurer
On Wed, 2012-11-28 at 14:26 -0800, Chegu Vinod wrote:
 On 11/27/2012 6:23 AM, Chegu Vinod wrote:
 
  On 11/27/2012 2:30 AM, Raghavendra K T wrote: 
   On 11/26/2012 07:05 PM, Andrew Jones wrote: 
On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T
wrote: 
 From: Peter Zijlstra pet...@infradead.org 
 
 In case of undercomitted scenarios, especially in large
 guests 
 yield_to overhead is significantly high. when run queue length
 of 
 source and target is one, take an opportunity to bail out and
 return 
 -ESRCH. This return condition can be further exploited to
 quickly come 
 out of PLE handler. 
 
 (History: Raghavendra initially worked on break out of kvm ple
 handler upon 
   seeing source runqueue length = 1, but it had to export rq
 length). 
   Peter came up with the elegant idea of return -ESRCH in
 scheduler core. 
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org 
 Raghavendra, Checking the rq length of target vcpu condition
 added.(thanks Avi) 
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com 
 Signed-off-by: Raghavendra K T
 raghavendra...@linux.vnet.ibm.com 
 --- 
 
   kernel/sched/core.c |   25 +++-- 
   1 file changed, 19 insertions(+), 6 deletions(-) 
 
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c 
 index 2d8927f..fc219a5 100644 
 --- a/kernel/sched/core.c 
 +++ b/kernel/sched/core.c 
 @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); 
* It's the caller's job to ensure that the target task
 struct 
* can't go away on us before we can do any checks. 
* 
 - * Returns true if we indeed boosted the target task. 
 + * Returns: 
 + *true (0) if we indeed boosted the target task. 
 + *false (0) if we failed to boost the target. 
 + *-ESRCH if there's no task to yield to. 
*/ 
   bool __sched yield_to(struct task_struct *p, bool preempt) 
   { 
 @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct
 task_struct *p, bool preempt) 
 
   again: 
   p_rq = task_rq(p); 
 +/* 
 + * If we're the only runnable task on the rq and target
 rq also 
 + * has only one task, there's absolutely no point in
 yielding. 
 + */ 
 +if (rq-nr_running == 1  p_rq-nr_running == 1) { 
 +yielded = -ESRCH; 
 +goto out_irq; 
 +} 
 + 
   double_rq_lock(rq, p_rq); 
   while (task_rq(p) != p_rq) { 
   double_rq_unlock(rq, p_rq); 
 @@ -4310,13 +4322,13 @@ again: 
   } 
 
   if (!curr-sched_class-yield_to_task) 
 -goto out; 
 +goto out_unlock; 
 
   if (curr-sched_class != p-sched_class) 
 -goto out; 
 +goto out_unlock; 
 
   if (task_running(p_rq, p) || p-state) 
 -goto out; 
 +goto out_unlock; 
 
   yielded = curr-sched_class-yield_to_task(rq, p,
 preempt); 
   if (yielded) { 
 @@ -4329,11 +4341,12 @@ again: 
   resched_task(p_rq-curr); 
   } 
 
 -out: 
 +out_unlock: 
   double_rq_unlock(rq, p_rq); 
 +out_irq: 
   local_irq_restore(flags); 
 
 -if (yielded) 
 +if (yielded  0) 
   schedule(); 
 
   return yielded; 
 

Acked-by: Andrew Jones drjo...@redhat.com 

   
   Thank you Drew. 
   
   Marcelo Gleb.. Please let me know if you have comments / concerns
   on the patches.. 
   
   Andrew, Vinod, IMO, the patch set looks good for undercommit
   scenarios 
   especially for large guests where we do have overhead of vcpu
   iteration 
   of ple handler.. 
   
   . 
   
  Thanks Raghu. Will try to get this latest patch set evaluated and
  get back to you. 
  
  
 Hi Raghu,
 
 Here is some preliminary data with your latest set of  PLE patches (
 also with Andrew's throttled yield_to() change).
 
 Ran a single guest on a 80 core Westmere platform. [Note: Host and
 Guest had the latest kernel from kvm.git and also using the latest
  qemu from qemu.git as of yesterday morning]. 
 
 The guest was running a AIM7 high_systime workload. (Note:
 high_systime is a kernel intensive micro-benchmark but in this case it
 was run just as a workload in the guest to trigger spinlock etc.
 contention in the guest OS and hence PLE (i.e. this is not a real
 benchmark run). 'have run this workload with a constant # (i.e. 2000)
 users with 100 jobs per user. The numbers below represent the # of
 jobs per minute (JPM) -  higher the better) .
 
  40VCPU  60VCPU  80VCPU 
 
 a) 3.7.0-rc6+ w/ ple_gap=0   ~102K   ~88K~81K
 
 b) 3.7.0-rc6+ ~53K   ~25K~18-20K

 c) 3.7.0-rc6+ w/ PLE patches ~100K   ~81K~48K-69K  - lot of variation 
 from run to run.
 

Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-28 Thread Chegu Vinod

On 11/28/2012 5:09 PM, Chegu Vinod wrote:

On 11/27/2012 6:23 AM, Chegu Vinod wrote:

On 11/27/2012 2:30 AM, Raghavendra K T wrote:

On 11/26/2012 07:05 PM, Andrew Jones wrote:

On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:

From: Peter Zijlstra pet...@infradead.org

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly 
come

out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple 
handler upon

  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in 
scheduler core.


Signed-off-by: Peter Zijlstra pet...@infradead.org
Raghavendra, Checking the rq length of target vcpu condition 
added.(thanks Avi)

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ *true (0) if we indeed boosted the target task.
+ *false (0) if we failed to boost the target.
+ *-ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct 
*p, bool preempt)


  again:
  p_rq = task_rq(p);
+/*
+ * If we're the only runnable task on the rq and target rq also
+ * has only one task, there's absolutely no point in yielding.
+ */
+if (rq-nr_running == 1  p_rq-nr_running == 1) {
+yielded = -ESRCH;
+goto out_irq;
+}
+
  double_rq_lock(rq, p_rq);
  while (task_rq(p) != p_rq) {
  double_rq_unlock(rq, p_rq);
@@ -4310,13 +4322,13 @@ again:
  }

  if (!curr-sched_class-yield_to_task)
-goto out;
+goto out_unlock;

  if (curr-sched_class != p-sched_class)
-goto out;
+goto out_unlock;

  if (task_running(p_rq, p) || p-state)
-goto out;
+goto out_unlock;

  yielded = curr-sched_class-yield_to_task(rq, p, preempt);
  if (yielded) {
@@ -4329,11 +4341,12 @@ again:
  resched_task(p_rq-curr);
  }

-out:
+out_unlock:
  double_rq_unlock(rq, p_rq);
+out_irq:
  local_irq_restore(flags);

-if (yielded)
+if (yielded  0)
  schedule();

  return yielded;



Acked-by: Andrew Jones drjo...@redhat.com



Thank you Drew.

Marcelo Gleb.. Please let me know if you have comments / concerns on 
the patches..


Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios
especially for large guests where we do have overhead of vcpu iteration
of ple handler..

.

Thanks Raghu. Will try to get this latest patch set evaluated and get 
back to you.


Vinod





 Resending as prev. email to the kvm and lkml email aliases bounced 
twice... Apologies for any repeats! 



Hi Raghu,

Here is some preliminary data with your latest set ofPLE patches ( 
also with Andrew's throttled yield_to() change).


Ran a single guest on a 80 core Westmere platform. [Note: Host and 
Guest had the latest kernel from kvm.git and also using the latestqemu 
from qemu.git as of yesterday morning].


The guest was running a AIM7 high_systime workload. (Note: 
high_systime is a kernel intensive micro-benchmark but in this case it 
was run just as a workload in the guest to trigger spinlock etc. 
contention in the guest OS and hence PLE (i.e. this is not a real 
benchmark run). 'have run this workload with a constant # (i.e. 2000) 
users with 100 jobs per user. The numbers below represent the # of 
jobs per minute (JPM) -higher the better) .


40VCPU60VCPU80VCPU

a) 3.7.0-rc6+ w/ ple_gap=0~102K~88K~81K


b) 3.7.0-rc6+~53K~25K~18-20K

c) 3.7.0-rc6+ w/ PLE patches~100K~81K~48K-69K- lot of variation from 
run to run.


d) 3.7.0-rc6+ w/throttled

yield_to() change~101K~87K~78K

---

The PLE patches case (c) does show improvements in this non-overcommit 
large guest case when compared to the case (b). However at 80way 
started to observe quite a bit of variation from run to run and the 
JPM was lower when compared with the throttled yield_to() change case (d).


For this 80way in case (c) also noticed that average time spent in the 
PLE exit (as reported by a small samplings from perf kvm stat) was 
varying quite a bit and was at times much greater when compared with 
the case of throttled yield_to() change case (d). More details are 

Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-27 Thread Raghavendra K T

On 11/26/2012 07:05 PM, Andrew Jones wrote:

On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:

From: Peter Zijlstra pet...@infradead.org

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple handler upon
  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler core.

Signed-off-by: Peter Zijlstra pet...@infradead.org
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ * true (0) if we indeed boosted the target task.
+ * false (0) if we failed to boost the target.
+ * -ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
preempt)

  again:
p_rq = task_rq(p);
+   /*
+* If we're the only runnable task on the rq and target rq also
+* has only one task, there's absolutely no point in yielding.
+*/
+   if (rq-nr_running == 1  p_rq-nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }
+
double_rq_lock(rq, p_rq);
while (task_rq(p) != p_rq) {
double_rq_unlock(rq, p_rq);
@@ -4310,13 +4322,13 @@ again:
}

if (!curr-sched_class-yield_to_task)
-   goto out;
+   goto out_unlock;

if (curr-sched_class != p-sched_class)
-   goto out;
+   goto out_unlock;

if (task_running(p_rq, p) || p-state)
-   goto out;
+   goto out_unlock;

yielded = curr-sched_class-yield_to_task(rq, p, preempt);
if (yielded) {
@@ -4329,11 +4341,12 @@ again:
resched_task(p_rq-curr);
}

-out:
+out_unlock:
double_rq_unlock(rq, p_rq);
+out_irq:
local_irq_restore(flags);

-   if (yielded)
+   if (yielded  0)
schedule();

return yielded;



Acked-by: Andrew Jones drjo...@redhat.com



Thank you Drew.

Marcelo Gleb.. Please let me know if you have comments / concerns on the 
patches..


Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios
especially for large guests where we do have overhead of vcpu iteration
of ple handler..

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-27 Thread Andrew Theurer
On Tue, 2012-11-27 at 16:00 +0530, Raghavendra K T wrote:
 On 11/26/2012 07:05 PM, Andrew Jones wrote:
  On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:
  From: Peter Zijlstra pet...@infradead.org
 
  In case of undercomitted scenarios, especially in large guests
  yield_to overhead is significantly high. when run queue length of
  source and target is one, take an opportunity to bail out and return
  -ESRCH. This return condition can be further exploited to quickly come
  out of PLE handler.
 
  (History: Raghavendra initially worked on break out of kvm ple handler upon
seeing source runqueue length = 1, but it had to export rq length).
Peter came up with the elegant idea of return -ESRCH in scheduler core.
 
  Signed-off-by: Peter Zijlstra pet...@infradead.org
  Raghavendra, Checking the rq length of target vcpu condition added.(thanks 
  Avi)
  Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
  Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
  ---
 
kernel/sched/core.c |   25 +++--
1 file changed, 19 insertions(+), 6 deletions(-)
 
  diff --git a/kernel/sched/core.c b/kernel/sched/core.c
  index 2d8927f..fc219a5 100644
  --- a/kernel/sched/core.c
  +++ b/kernel/sched/core.c
  @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
 * It's the caller's job to ensure that the target task struct
 * can't go away on us before we can do any checks.
 *
  - * Returns true if we indeed boosted the target task.
  + * Returns:
  + *true (0) if we indeed boosted the target task.
  + *false (0) if we failed to boost the target.
  + *-ESRCH if there's no task to yield to.
 */
bool __sched yield_to(struct task_struct *p, bool preempt)
{
  @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
  preempt)
 
again:
 p_rq = task_rq(p);
  +  /*
  +   * If we're the only runnable task on the rq and target rq also
  +   * has only one task, there's absolutely no point in yielding.
  +   */
  +  if (rq-nr_running == 1  p_rq-nr_running == 1) {
  +  yielded = -ESRCH;
  +  goto out_irq;
  +  }
  +
 double_rq_lock(rq, p_rq);
 while (task_rq(p) != p_rq) {
 double_rq_unlock(rq, p_rq);
  @@ -4310,13 +4322,13 @@ again:
 }
 
 if (!curr-sched_class-yield_to_task)
  -  goto out;
  +  goto out_unlock;
 
 if (curr-sched_class != p-sched_class)
  -  goto out;
  +  goto out_unlock;
 
 if (task_running(p_rq, p) || p-state)
  -  goto out;
  +  goto out_unlock;
 
 yielded = curr-sched_class-yield_to_task(rq, p, preempt);
 if (yielded) {
  @@ -4329,11 +4341,12 @@ again:
 resched_task(p_rq-curr);
 }
 
  -out:
  +out_unlock:
 double_rq_unlock(rq, p_rq);
  +out_irq:
 local_irq_restore(flags);
 
  -  if (yielded)
  +  if (yielded  0)
 schedule();
 
 return yielded;
 
 
  Acked-by: Andrew Jones drjo...@redhat.com
 
 
 Thank you Drew.
 
 Marcelo Gleb.. Please let me know if you have comments / concerns on the 
 patches..
 
 Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios
 especially for large guests where we do have overhead of vcpu iteration
 of ple handler..

I agree, looks fine for undercommit scenarios.  I do wonder what happens
with 1.5x overcommit, where we might see 1/2 the host cpus with runqueue
of 2 and 1/2 of the host cpus with a runqueue of 1.  Even with this
change that scenario still might be fine, but it would be nice to see a
comparison.

-Andrew


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-27 Thread Chegu Vinod

On 11/27/2012 2:30 AM, Raghavendra K T wrote:

On 11/26/2012 07:05 PM, Andrew Jones wrote:

On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:

From: Peter Zijlstra pet...@infradead.org

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple 
handler upon

  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler 
core.


Signed-off-by: Peter Zijlstra pet...@infradead.org
Raghavendra, Checking the rq length of target vcpu condition 
added.(thanks Avi)

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ *true (0) if we indeed boosted the target task.
+ *false (0) if we failed to boost the target.
+ *-ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, 
bool preempt)


  again:
  p_rq = task_rq(p);
+/*
+ * If we're the only runnable task on the rq and target rq also
+ * has only one task, there's absolutely no point in yielding.
+ */
+if (rq-nr_running == 1  p_rq-nr_running == 1) {
+yielded = -ESRCH;
+goto out_irq;
+}
+
  double_rq_lock(rq, p_rq);
  while (task_rq(p) != p_rq) {
  double_rq_unlock(rq, p_rq);
@@ -4310,13 +4322,13 @@ again:
  }

  if (!curr-sched_class-yield_to_task)
-goto out;
+goto out_unlock;

  if (curr-sched_class != p-sched_class)
-goto out;
+goto out_unlock;

  if (task_running(p_rq, p) || p-state)
-goto out;
+goto out_unlock;

  yielded = curr-sched_class-yield_to_task(rq, p, preempt);
  if (yielded) {
@@ -4329,11 +4341,12 @@ again:
  resched_task(p_rq-curr);
  }

-out:
+out_unlock:
  double_rq_unlock(rq, p_rq);
+out_irq:
  local_irq_restore(flags);

-if (yielded)
+if (yielded  0)
  schedule();

  return yielded;



Acked-by: Andrew Jones drjo...@redhat.com



Thank you Drew.

Marcelo Gleb.. Please let me know if you have comments / concerns on 
the patches..


Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios
especially for large guests where we do have overhead of vcpu iteration
of ple handler..

.

Thanks Raghu. Will try to get this latest patch set evaluated and get 
back to you.


Vinod

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-27 Thread Raghavendra K T

On 11/27/2012 07:34 PM, Andrew Theurer wrote:

On Tue, 2012-11-27 at 16:00 +0530, Raghavendra K T wrote:

On 11/26/2012 07:05 PM, Andrew Jones wrote:

On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:

From: Peter Zijlstra pet...@infradead.org

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple handler upon
   seeing source runqueue length = 1, but it had to export rq length).
   Peter came up with the elegant idea of return -ESRCH in scheduler core.

Signed-off-by: Peter Zijlstra pet...@infradead.org
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

   kernel/sched/core.c |   25 +++--
   1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
* It's the caller's job to ensure that the target task struct
* can't go away on us before we can do any checks.
*
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ * true (0) if we indeed boosted the target task.
+ * false (0) if we failed to boost the target.
+ * -ESRCH if there's no task to yield to.
*/
   bool __sched yield_to(struct task_struct *p, bool preempt)
   {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
preempt)

   again:
p_rq = task_rq(p);
+   /*
+* If we're the only runnable task on the rq and target rq also
+* has only one task, there's absolutely no point in yielding.
+*/
+   if (rq-nr_running == 1  p_rq-nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }
+
double_rq_lock(rq, p_rq);
while (task_rq(p) != p_rq) {
double_rq_unlock(rq, p_rq);
@@ -4310,13 +4322,13 @@ again:
}

if (!curr-sched_class-yield_to_task)
-   goto out;
+   goto out_unlock;

if (curr-sched_class != p-sched_class)
-   goto out;
+   goto out_unlock;

if (task_running(p_rq, p) || p-state)
-   goto out;
+   goto out_unlock;

yielded = curr-sched_class-yield_to_task(rq, p, preempt);
if (yielded) {
@@ -4329,11 +4341,12 @@ again:
resched_task(p_rq-curr);
}

-out:
+out_unlock:
double_rq_unlock(rq, p_rq);
+out_irq:
local_irq_restore(flags);

-   if (yielded)
+   if (yielded  0)
schedule();

return yielded;



Acked-by: Andrew Jones drjo...@redhat.com



Thank you Drew.

Marcelo Gleb.. Please let me know if you have comments / concerns on the
patches..

Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios
especially for large guests where we do have overhead of vcpu iteration
of ple handler..


I agree, looks fine for undercommit scenarios.  I do wonder what happens
with 1.5x overcommit, where we might see 1/2 the host cpus with runqueue
of 2 and 1/2 of the host cpus with a runqueue of 1.  Even with this
change that scenario still might be fine, but it would be nice to see a
comparison.



Hi Andrew, yes thanks for pointing out 1.5x case which should have
theoretical  worst case..
I tried with 2 24 vcpu guests and the same 32 core machine.. Here is
the result..

Ebizzy (rec/sec higher is better)
x base
+ patched
N   AvgStddev
x  10 2688.6 347.55917
+  10 2707.6 260.93728

improvement 0.706%

dbench (Throughput MB/sec higher is better)
x base
+ patched
N AvgStddev
x  103164.712 140.24468
+  103244.021 185.92434

Improvement 2.5%

So there is no significant improvement / degradation seen in
1.5x.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-26 Thread Raghavendra K T
From: Peter Zijlstra pet...@infradead.org

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple handler upon
 seeing source runqueue length = 1, but it had to export rq length).
 Peter came up with the elegant idea of return -ESRCH in scheduler core.

Signed-off-by: Peter Zijlstra pet...@infradead.org
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

 kernel/sched/core.c |   25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
  * It's the caller's job to ensure that the target task struct
  * can't go away on us before we can do any checks.
  *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ * true (0) if we indeed boosted the target task.
+ * false (0) if we failed to boost the target.
+ * -ESRCH if there's no task to yield to.
  */
 bool __sched yield_to(struct task_struct *p, bool preempt)
 {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
preempt)
 
 again:
p_rq = task_rq(p);
+   /*
+* If we're the only runnable task on the rq and target rq also
+* has only one task, there's absolutely no point in yielding.
+*/
+   if (rq-nr_running == 1  p_rq-nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }
+
double_rq_lock(rq, p_rq);
while (task_rq(p) != p_rq) {
double_rq_unlock(rq, p_rq);
@@ -4310,13 +4322,13 @@ again:
}
 
if (!curr-sched_class-yield_to_task)
-   goto out;
+   goto out_unlock;
 
if (curr-sched_class != p-sched_class)
-   goto out;
+   goto out_unlock;
 
if (task_running(p_rq, p) || p-state)
-   goto out;
+   goto out_unlock;
 
yielded = curr-sched_class-yield_to_task(rq, p, preempt);
if (yielded) {
@@ -4329,11 +4341,12 @@ again:
resched_task(p_rq-curr);
}
 
-out:
+out_unlock:
double_rq_unlock(rq, p_rq);
+out_irq:
local_irq_restore(flags);
 
-   if (yielded)
+   if (yielded  0)
schedule();
 
return yielded;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2012-11-26 Thread Andrew Jones
On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote:
 From: Peter Zijlstra pet...@infradead.org
 
 In case of undercomitted scenarios, especially in large guests
 yield_to overhead is significantly high. when run queue length of
 source and target is one, take an opportunity to bail out and return
 -ESRCH. This return condition can be further exploited to quickly come
 out of PLE handler.
 
 (History: Raghavendra initially worked on break out of kvm ple handler upon
  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler core.
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Raghavendra, Checking the rq length of target vcpu condition added.(thanks 
 Avi)
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 
  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)
 
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 2d8927f..fc219a5 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
 - * Returns true if we indeed boosted the target task.
 + * Returns:
 + *   true (0) if we indeed boosted the target task.
 + *   false (0) if we failed to boost the target.
 + *   -ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
 @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
 preempt)
  
  again:
   p_rq = task_rq(p);
 + /*
 +  * If we're the only runnable task on the rq and target rq also
 +  * has only one task, there's absolutely no point in yielding.
 +  */
 + if (rq-nr_running == 1  p_rq-nr_running == 1) {
 + yielded = -ESRCH;
 + goto out_irq;
 + }
 +
   double_rq_lock(rq, p_rq);
   while (task_rq(p) != p_rq) {
   double_rq_unlock(rq, p_rq);
 @@ -4310,13 +4322,13 @@ again:
   }
  
   if (!curr-sched_class-yield_to_task)
 - goto out;
 + goto out_unlock;
  
   if (curr-sched_class != p-sched_class)
 - goto out;
 + goto out_unlock;
  
   if (task_running(p_rq, p) || p-state)
 - goto out;
 + goto out_unlock;
  
   yielded = curr-sched_class-yield_to_task(rq, p, preempt);
   if (yielded) {
 @@ -4329,11 +4341,12 @@ again:
   resched_task(p_rq-curr);
   }
  
 -out:
 +out_unlock:
   double_rq_unlock(rq, p_rq);
 +out_irq:
   local_irq_restore(flags);
  
 - if (yielded)
 + if (yielded  0)
   schedule();
  
   return yielded;


Acked-by: Andrew Jones drjo...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html