Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
[I forgot to do TO to Ingo last time] Ingo, Could you please take this into x86 tree. This is Acked-by: Andrew Jones drjo...@redhat.com Tested-by: Chegu Vinod chegu_vi...@hp.com Marcelo, do you want to add your Acked-by/Reviewed-by? On 12/14/2012 09:10 PM, Raghavendra K T wrote: Hi Ingo, Could you please take this into x86 tree? Thanks, On 12/14/2012 05:59 AM, Marcelo Tosatti wrote: Raghavendra, Please get this integrate through x86 tree (Ingo CC'ed). On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + *true (0) if we indeed boosted the target task. + *false (0) if we failed to boost the target. + *-ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); +/* + * If we're the only runnable task on the rq and target rq also + * has only one task, there's absolutely no point in yielding. + */ +if (rq-nr_running == 1 p_rq-nr_running == 1) { +yielded = -ESRCH; +goto out_irq; +} + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) -goto out; +goto out_unlock; if (curr-sched_class != p-sched_class) -goto out; +goto out_unlock; if (task_running(p_rq, p) || p-state) -goto out; +goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); -if (yielded) +if (yielded 0) schedule(); return yielded; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
Raghavendra, Please get this integrate through x86 tree (Ingo CC'ed). On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + * true (0) if we indeed boosted the target task. + * false (0) if we failed to boost the target. + * -ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); + /* + * If we're the only runnable task on the rq and target rq also + * has only one task, there's absolutely no point in yielding. + */ + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) - goto out; + goto out_unlock; if (curr-sched_class != p-sched_class) - goto out; + goto out_unlock; if (task_running(p_rq, p) || p-state) - goto out; + goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); - if (yielded) + if (yielded 0) schedule(); return yielded; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
On Wed, 2012-11-28 at 14:26 -0800, Chegu Vinod wrote: On 11/27/2012 6:23 AM, Chegu Vinod wrote: On 11/27/2012 2:30 AM, Raghavendra K T wrote: On 11/26/2012 07:05 PM, Andrew Jones wrote: On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + *true (0) if we indeed boosted the target task. + *false (0) if we failed to boost the target. + *-ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); +/* + * If we're the only runnable task on the rq and target rq also + * has only one task, there's absolutely no point in yielding. + */ +if (rq-nr_running == 1 p_rq-nr_running == 1) { +yielded = -ESRCH; +goto out_irq; +} + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) -goto out; +goto out_unlock; if (curr-sched_class != p-sched_class) -goto out; +goto out_unlock; if (task_running(p_rq, p) || p-state) -goto out; +goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); -if (yielded) +if (yielded 0) schedule(); return yielded; Acked-by: Andrew Jones drjo...@redhat.com Thank you Drew. Marcelo Gleb.. Please let me know if you have comments / concerns on the patches.. Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios especially for large guests where we do have overhead of vcpu iteration of ple handler.. . Thanks Raghu. Will try to get this latest patch set evaluated and get back to you. Hi Raghu, Here is some preliminary data with your latest set of PLE patches ( also with Andrew's throttled yield_to() change). Ran a single guest on a 80 core Westmere platform. [Note: Host and Guest had the latest kernel from kvm.git and also using the latest qemu from qemu.git as of yesterday morning]. The guest was running a AIM7 high_systime workload. (Note: high_systime is a kernel intensive micro-benchmark but in this case it was run just as a workload in the guest to trigger spinlock etc. contention in the guest OS and hence PLE (i.e. this is not a real benchmark run). 'have run this workload with a constant # (i.e. 2000) users with 100 jobs per user. The numbers below represent the # of jobs per minute (JPM) - higher the better) . 40VCPU 60VCPU 80VCPU a) 3.7.0-rc6+ w/ ple_gap=0 ~102K ~88K~81K b) 3.7.0-rc6+ ~53K ~25K~18-20K c) 3.7.0-rc6+ w/ PLE patches ~100K ~81K~48K-69K - lot of variation from run to run.
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
On 11/28/2012 5:09 PM, Chegu Vinod wrote: On 11/27/2012 6:23 AM, Chegu Vinod wrote: On 11/27/2012 2:30 AM, Raghavendra K T wrote: On 11/26/2012 07:05 PM, Andrew Jones wrote: On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + *true (0) if we indeed boosted the target task. + *false (0) if we failed to boost the target. + *-ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); +/* + * If we're the only runnable task on the rq and target rq also + * has only one task, there's absolutely no point in yielding. + */ +if (rq-nr_running == 1 p_rq-nr_running == 1) { +yielded = -ESRCH; +goto out_irq; +} + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) -goto out; +goto out_unlock; if (curr-sched_class != p-sched_class) -goto out; +goto out_unlock; if (task_running(p_rq, p) || p-state) -goto out; +goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); -if (yielded) +if (yielded 0) schedule(); return yielded; Acked-by: Andrew Jones drjo...@redhat.com Thank you Drew. Marcelo Gleb.. Please let me know if you have comments / concerns on the patches.. Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios especially for large guests where we do have overhead of vcpu iteration of ple handler.. . Thanks Raghu. Will try to get this latest patch set evaluated and get back to you. Vinod Resending as prev. email to the kvm and lkml email aliases bounced twice... Apologies for any repeats! Hi Raghu, Here is some preliminary data with your latest set ofPLE patches ( also with Andrew's throttled yield_to() change). Ran a single guest on a 80 core Westmere platform. [Note: Host and Guest had the latest kernel from kvm.git and also using the latestqemu from qemu.git as of yesterday morning]. The guest was running a AIM7 high_systime workload. (Note: high_systime is a kernel intensive micro-benchmark but in this case it was run just as a workload in the guest to trigger spinlock etc. contention in the guest OS and hence PLE (i.e. this is not a real benchmark run). 'have run this workload with a constant # (i.e. 2000) users with 100 jobs per user. The numbers below represent the # of jobs per minute (JPM) -higher the better) . 40VCPU60VCPU80VCPU a) 3.7.0-rc6+ w/ ple_gap=0~102K~88K~81K b) 3.7.0-rc6+~53K~25K~18-20K c) 3.7.0-rc6+ w/ PLE patches~100K~81K~48K-69K- lot of variation from run to run. d) 3.7.0-rc6+ w/throttled yield_to() change~101K~87K~78K --- The PLE patches case (c) does show improvements in this non-overcommit large guest case when compared to the case (b). However at 80way started to observe quite a bit of variation from run to run and the JPM was lower when compared with the throttled yield_to() change case (d). For this 80way in case (c) also noticed that average time spent in the PLE exit (as reported by a small samplings from perf kvm stat) was varying quite a bit and was at times much greater when compared with the case of throttled yield_to() change case (d). More details are
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
On 11/26/2012 07:05 PM, Andrew Jones wrote: On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + * true (0) if we indeed boosted the target task. + * false (0) if we failed to boost the target. + * -ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); + /* +* If we're the only runnable task on the rq and target rq also +* has only one task, there's absolutely no point in yielding. +*/ + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) - goto out; + goto out_unlock; if (curr-sched_class != p-sched_class) - goto out; + goto out_unlock; if (task_running(p_rq, p) || p-state) - goto out; + goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); - if (yielded) + if (yielded 0) schedule(); return yielded; Acked-by: Andrew Jones drjo...@redhat.com Thank you Drew. Marcelo Gleb.. Please let me know if you have comments / concerns on the patches.. Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios especially for large guests where we do have overhead of vcpu iteration of ple handler.. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
On Tue, 2012-11-27 at 16:00 +0530, Raghavendra K T wrote: On 11/26/2012 07:05 PM, Andrew Jones wrote: On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + *true (0) if we indeed boosted the target task. + *false (0) if we failed to boost the target. + *-ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); + /* + * If we're the only runnable task on the rq and target rq also + * has only one task, there's absolutely no point in yielding. + */ + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) - goto out; + goto out_unlock; if (curr-sched_class != p-sched_class) - goto out; + goto out_unlock; if (task_running(p_rq, p) || p-state) - goto out; + goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); - if (yielded) + if (yielded 0) schedule(); return yielded; Acked-by: Andrew Jones drjo...@redhat.com Thank you Drew. Marcelo Gleb.. Please let me know if you have comments / concerns on the patches.. Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios especially for large guests where we do have overhead of vcpu iteration of ple handler.. I agree, looks fine for undercommit scenarios. I do wonder what happens with 1.5x overcommit, where we might see 1/2 the host cpus with runqueue of 2 and 1/2 of the host cpus with a runqueue of 1. Even with this change that scenario still might be fine, but it would be nice to see a comparison. -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
On 11/27/2012 2:30 AM, Raghavendra K T wrote: On 11/26/2012 07:05 PM, Andrew Jones wrote: On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + *true (0) if we indeed boosted the target task. + *false (0) if we failed to boost the target. + *-ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); +/* + * If we're the only runnable task on the rq and target rq also + * has only one task, there's absolutely no point in yielding. + */ +if (rq-nr_running == 1 p_rq-nr_running == 1) { +yielded = -ESRCH; +goto out_irq; +} + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) -goto out; +goto out_unlock; if (curr-sched_class != p-sched_class) -goto out; +goto out_unlock; if (task_running(p_rq, p) || p-state) -goto out; +goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); -if (yielded) +if (yielded 0) schedule(); return yielded; Acked-by: Andrew Jones drjo...@redhat.com Thank you Drew. Marcelo Gleb.. Please let me know if you have comments / concerns on the patches.. Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios especially for large guests where we do have overhead of vcpu iteration of ple handler.. . Thanks Raghu. Will try to get this latest patch set evaluated and get back to you. Vinod -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
On 11/27/2012 07:34 PM, Andrew Theurer wrote: On Tue, 2012-11-27 at 16:00 +0530, Raghavendra K T wrote: On 11/26/2012 07:05 PM, Andrew Jones wrote: On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + * true (0) if we indeed boosted the target task. + * false (0) if we failed to boost the target. + * -ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); + /* +* If we're the only runnable task on the rq and target rq also +* has only one task, there's absolutely no point in yielding. +*/ + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) - goto out; + goto out_unlock; if (curr-sched_class != p-sched_class) - goto out; + goto out_unlock; if (task_running(p_rq, p) || p-state) - goto out; + goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); - if (yielded) + if (yielded 0) schedule(); return yielded; Acked-by: Andrew Jones drjo...@redhat.com Thank you Drew. Marcelo Gleb.. Please let me know if you have comments / concerns on the patches.. Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios especially for large guests where we do have overhead of vcpu iteration of ple handler.. I agree, looks fine for undercommit scenarios. I do wonder what happens with 1.5x overcommit, where we might see 1/2 the host cpus with runqueue of 2 and 1/2 of the host cpus with a runqueue of 1. Even with this change that scenario still might be fine, but it would be nice to see a comparison. Hi Andrew, yes thanks for pointing out 1.5x case which should have theoretical worst case.. I tried with 2 24 vcpu guests and the same 32 core machine.. Here is the result.. Ebizzy (rec/sec higher is better) x base + patched N AvgStddev x 10 2688.6 347.55917 + 10 2707.6 260.93728 improvement 0.706% dbench (Throughput MB/sec higher is better) x base + patched N AvgStddev x 103164.712 140.24468 + 103244.021 185.92434 Improvement 2.5% So there is no significant improvement / degradation seen in 1.5x. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + * true (0) if we indeed boosted the target task. + * false (0) if we failed to boost the target. + * -ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); + /* +* If we're the only runnable task on the rq and target rq also +* has only one task, there's absolutely no point in yielding. +*/ + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) - goto out; + goto out_unlock; if (curr-sched_class != p-sched_class) - goto out; + goto out_unlock; if (task_running(p_rq, p) || p-state) - goto out; + goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); - if (yielded) + if (yielded 0) schedule(); return yielded; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task
On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. (History: Raghavendra initially worked on break out of kvm ple handler upon seeing source runqueue length = 1, but it had to export rq length). Peter came up with the elegant idea of return -ESRCH in scheduler core. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + * true (0) if we indeed boosted the target task. + * false (0) if we failed to boost the target. + * -ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); + /* + * If we're the only runnable task on the rq and target rq also + * has only one task, there's absolutely no point in yielding. + */ + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) - goto out; + goto out_unlock; if (curr-sched_class != p-sched_class) - goto out; + goto out_unlock; if (task_running(p_rq, p) || p-state) - goto out; + goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); - if (yielded) + if (yielded 0) schedule(); return yielded; Acked-by: Andrew Jones drjo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html