Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Sat, Feb 15, 2014 at 7:24 AM, Kevin Hilman wrote: > From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001 > From: Kevin Hilman > Date: Fri, 14 Feb 2014 15:10:58 -0800 > Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to > !tick_nohz_full_mask > > To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues > from running on full dynticks CPUs. To do this, set the default > workqueue cpumask to be the set of "housekeeping" CPUs instead of all > possible CPUs. > > This is just just the starting/default cpumask, and can be overridden > with all the normal ways (NUMA settings, apply_workqueue_attrs and via > sysfs for workqueus with the WQ_SYSFS attribute.) > > Cc: Tejun Heo > Cc: Paul McKenney > Cc: Frederic Weisbecker > Signed-off-by: Kevin Hilman > --- > kernel/workqueue.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 987293d03ebc..9a9d9b0eaf6d 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -48,6 +48,7 @@ > #include > #include > #include > +#include > > #include "workqueue_internal.h" > > @@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t > gfp_mask) > if (!alloc_cpumask_var(>cpumask, gfp_mask)) > goto fail; > > +#ifdef CONFIG_NO_HZ_FULL > + cpumask_complement(attrs->cpumask, tick_nohz_full_mask); > +#else > cpumask_copy(attrs->cpumask, cpu_possible_mask); > +#endif > return attrs; > fail: > free_workqueue_attrs(attrs); Can we play with this mask at runtime? I thought a better idea would be to keep this mask as mask of all CPUs initially and once any CPU enters NO_HZ_FULL mode we can remove that from mask? And ones it leaves that mode we can get that added again.. I am looking to use similar concept in case of un-pinned timers with my activity around cpuset.quiesce option.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Sat, Feb 15, 2014 at 7:24 AM, Kevin Hilman khil...@linaro.org wrote: From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001 From: Kevin Hilman khil...@linaro.org Date: Fri, 14 Feb 2014 15:10:58 -0800 Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to !tick_nohz_full_mask To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues from running on full dynticks CPUs. To do this, set the default workqueue cpumask to be the set of housekeeping CPUs instead of all possible CPUs. This is just just the starting/default cpumask, and can be overridden with all the normal ways (NUMA settings, apply_workqueue_attrs and via sysfs for workqueus with the WQ_SYSFS attribute.) Cc: Tejun Heo t...@kernel.org Cc: Paul McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Signed-off-by: Kevin Hilman khil...@linaro.org --- kernel/workqueue.c | 5 + 1 file changed, 5 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 987293d03ebc..9a9d9b0eaf6d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -48,6 +48,7 @@ #include linux/nodemask.h #include linux/moduleparam.h #include linux/uaccess.h +#include linux/tick.h #include workqueue_internal.h @@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask) if (!alloc_cpumask_var(attrs-cpumask, gfp_mask)) goto fail; +#ifdef CONFIG_NO_HZ_FULL + cpumask_complement(attrs-cpumask, tick_nohz_full_mask); +#else cpumask_copy(attrs-cpumask, cpu_possible_mask); +#endif return attrs; fail: free_workqueue_attrs(attrs); Can we play with this mask at runtime? I thought a better idea would be to keep this mask as mask of all CPUs initially and once any CPU enters NO_HZ_FULL mode we can remove that from mask? And ones it leaves that mode we can get that added again.. I am looking to use similar concept in case of un-pinned timers with my activity around cpuset.quiesce option.. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Thu, 2014-02-27 at 15:43 +0100, Frederic Weisbecker wrote: > On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote: > > On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: > > > On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: > > > > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: > > > > > Acked-by: Lai Jiangshan > > > > > > > > Thank you all, queued for 3.15. > > > > > > > > We should also have some facility for moving the SRCU workqueues to > > > > housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does > > > > this patch already have that effect? > > > > > > Kevin Hilman and me plan to try to bring a new Kconfig option that could > > > let > > > us control the unbound workqueues affinity through sysfs. > > > > Handing control to the user seemed like a fine thing, so I started > > making a boot option to enable it. Forcing WQ_SYSFS on at sysfs > > decision spot doesn't go well, init order matters :) Post init frobbing > > required if you want to see/frob all unbound. > > I'm curious about the details. Is that because some workqueues are registered > before sysfs is even initialized? Yeah. I put in a test, and told it if not ready, go get ready and flag yourself as having BTDT (one registration being plenty), but that only made a different explosion. Post-init rescan is definitely a better plan, it can't be worse :) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Fri, Feb 14, 2014 at 03:24:35PM -0800, Kevin Hilman wrote: > Tejun Heo writes: > > > Hello, > > > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > >> +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > >> + to force the WQ_SYSFS workqueues to run on the specified set > >> + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > >> + "ls sys/devices/virtual/workqueue". > > > > One thing to be careful about is that once published, it becomes part > > of userland visible interface. Maybe adding some words warning > > against sprinkling WQ_SYSFS willy-nilly is a good idea? > > In the NO_HZ_FULL case, it seems to me we'd always want all unbound > workqueues to have their affinity set to the housekeeping CPUs. > > Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so > the affinity can be controlled? I guess the main reason would be that > all of these workqueue names would become permanent ABI. Right. It's a legitimate worry but couldn't we consider workqueue names just like kernel threads names? Ie: something that can be renamed or disappear anytime from a kernel version to another? Or sysfs has real strict rules about that and I'm just daydreaming? I've been thinking we could also have a pseudo-workqueue directory in /sys/devices/virtual/workqueue/unbounds with only cpumask as a file. Writing to it would set all unbound workqueue affinity, at least those that don't have WQ_SYSFS. This would solve the ABI issue and keep a single consistent interface for workqueue affinity. > > At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of > unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS > workqueues could still be overridden from userspace, but at least the > default would be sane, and help keep full dyntics CPUs isolated. > > Example patch below, only boot tested on 4-CPU ARM system with > CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat > /sys/devices/virtual/workqueue/writeback/cpumask' looked sane. If this > looks OK, I can maybe clean it up a bit and make it runtime check > instead of a compile time check. It can work too yeah. Maybe I prefer the idea of keeping a sysfs interface for all workqueues (whether we use a pseudo "unbounds" dir or not) because then the workqueue core stays unaware of dynticks details and it doesn't end up fiddling with timers core internals like the full dynticks cpumask. The result is more interdependency and possible headaches between timers and workqueue init ordering. And moreover people may forget to change WQ_SYSFS workqueues if all other UNBOUND workqueues are known to be automatically handled. Or we handle WQ_SYSFS as well along the way? But still WQ_SYSFS cpumask may be modified by other user programms so it's still a round of set that must be done before doing any isolation work. So I have mixed feelings between code complexity, simplicity for users, etc... What do you guys think? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote: > On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: > > On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: > > > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: > > > > Acked-by: Lai Jiangshan > > > > > > Thank you all, queued for 3.15. > > > > > > We should also have some facility for moving the SRCU workqueues to > > > housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does > > > this patch already have that effect? > > > > Kevin Hilman and me plan to try to bring a new Kconfig option that could let > > us control the unbound workqueues affinity through sysfs. > > Handing control to the user seemed like a fine thing, so I started > making a boot option to enable it. Forcing WQ_SYSFS on at sysfs > decision spot doesn't go well, init order matters :) Post init frobbing > required if you want to see/frob all unbound. I'm curious about the details. Is that because some workqueues are registered before sysfs is even initialized? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote: On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: Acked-by: Lai Jiangshan la...@cn.fujitsu.com Thank you all, queued for 3.15. We should also have some facility for moving the SRCU workqueues to housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does this patch already have that effect? Kevin Hilman and me plan to try to bring a new Kconfig option that could let us control the unbound workqueues affinity through sysfs. Handing control to the user seemed like a fine thing, so I started making a boot option to enable it. Forcing WQ_SYSFS on at sysfs decision spot doesn't go well, init order matters :) Post init frobbing required if you want to see/frob all unbound. I'm curious about the details. Is that because some workqueues are registered before sysfs is even initialized? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Fri, Feb 14, 2014 at 03:24:35PM -0800, Kevin Hilman wrote: Tejun Heo t...@kernel.org writes: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? In the NO_HZ_FULL case, it seems to me we'd always want all unbound workqueues to have their affinity set to the housekeeping CPUs. Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so the affinity can be controlled? I guess the main reason would be that all of these workqueue names would become permanent ABI. Right. It's a legitimate worry but couldn't we consider workqueue names just like kernel threads names? Ie: something that can be renamed or disappear anytime from a kernel version to another? Or sysfs has real strict rules about that and I'm just daydreaming? I've been thinking we could also have a pseudo-workqueue directory in /sys/devices/virtual/workqueue/unbounds with only cpumask as a file. Writing to it would set all unbound workqueue affinity, at least those that don't have WQ_SYSFS. This would solve the ABI issue and keep a single consistent interface for workqueue affinity. At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS workqueues could still be overridden from userspace, but at least the default would be sane, and help keep full dyntics CPUs isolated. Example patch below, only boot tested on 4-CPU ARM system with CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat /sys/devices/virtual/workqueue/writeback/cpumask' looked sane. If this looks OK, I can maybe clean it up a bit and make it runtime check instead of a compile time check. It can work too yeah. Maybe I prefer the idea of keeping a sysfs interface for all workqueues (whether we use a pseudo unbounds dir or not) because then the workqueue core stays unaware of dynticks details and it doesn't end up fiddling with timers core internals like the full dynticks cpumask. The result is more interdependency and possible headaches between timers and workqueue init ordering. And moreover people may forget to change WQ_SYSFS workqueues if all other UNBOUND workqueues are known to be automatically handled. Or we handle WQ_SYSFS as well along the way? But still WQ_SYSFS cpumask may be modified by other user programms so it's still a round of set that must be done before doing any isolation work. So I have mixed feelings between code complexity, simplicity for users, etc... What do you guys think? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Thu, 2014-02-27 at 15:43 +0100, Frederic Weisbecker wrote: On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote: On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: Acked-by: Lai Jiangshan la...@cn.fujitsu.com Thank you all, queued for 3.15. We should also have some facility for moving the SRCU workqueues to housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does this patch already have that effect? Kevin Hilman and me plan to try to bring a new Kconfig option that could let us control the unbound workqueues affinity through sysfs. Handing control to the user seemed like a fine thing, so I started making a boot option to enable it. Forcing WQ_SYSFS on at sysfs decision spot doesn't go well, init order matters :) Post init frobbing required if you want to see/frob all unbound. I'm curious about the details. Is that because some workqueues are registered before sysfs is even initialized? Yeah. I put in a test, and told it if not ready, go get ready and flag yourself as having BTDT (one registration being plenty), but that only made a different explosion. Post-init rescan is definitely a better plan, it can't be worse :) -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, 2014-02-24 at 09:55 -0800, Kevin Hilman wrote: > Yeah, my patch only addresses the nohz_full case, but since there > doesn't seem to be any general agreemenet about the generic case, it > seems that exposing all unbound workqueues via WQ_SYSFS is the way to > go. > > Mike, looks like you may have started on that. Did it get any further? No, work gets in the way of tinker time. Not sure which way I want to explore anyway. On the one hand, auto-quiesce thingy tied to domain construction/destruction sounds good, but on the other, just handing control to the user is highly attractive.. likely lots simpler too. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Mike Galbraith writes: > On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote: > >> So if there is NO_HZ_FULL, you have no objection to binding workqueues to >> the timekeeping CPUs, but that you would also like some form of automatic >> binding in the !NO_HZ_FULL case. Of course, if a common mechanism could >> serve both cases, that would be good. And yes, cpusets are frowned upon >> for some workloads. > > I'm not _objecting_, I'm not driving, Frederic's doing that ;-) > > That said, isolation seems to be turning into a property of nohz mode, > but as I see it, nohz_full is an extension to generic isolation. > >> So maybe start with Kevin's patch, but augment with something else for >> the !NO_HZ_FULL case? > > Sure (hm, does it work without workqueue.disable_numa ?). [ /me returns from vacation ] Yes, since it happens for every alloc_workqueue_attrs() > It just seems to me that tying it to sched domain construction would be > a better fit. That way, it doesn't matter what your isolation requiring > load is, whether you run a gaggle of realtime tasks or one HPC task your > business, the generic requirement is isolation, not tick mode. For one > HPC task per core, you want no tick, if you're running all SCHED_FIFO, > maybe you want that too, depends on the impact of nohz_full mode. All > sensitive loads want the isolation, but they may not like the price. > > I personally like the cpuset way. Being able to partition boxen on the > fly makes them very flexible. In a perfect world, you'd be able to > quiesce and configure offloading and nohz_full on the fly too, and not > end up with some hodgepodge like this needs boot option foo, that > happens invisibly because of config option bar, the other thing you have > to do manually.. and you get to eat 937 kthreads and tons of overhead on > all CPUs if you want the ability to _maybe_ run a critical task or two. Yeah, my patch only addresses the nohz_full case, but since there doesn't seem to be any general agreemenet about the generic case, it seems that exposing all unbound workqueues via WQ_SYSFS is the way to go. Mike, looks like you may have started on that. Did it get any further? Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Mike Galbraith bitbuc...@online.de writes: On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote: So if there is NO_HZ_FULL, you have no objection to binding workqueues to the timekeeping CPUs, but that you would also like some form of automatic binding in the !NO_HZ_FULL case. Of course, if a common mechanism could serve both cases, that would be good. And yes, cpusets are frowned upon for some workloads. I'm not _objecting_, I'm not driving, Frederic's doing that ;-) That said, isolation seems to be turning into a property of nohz mode, but as I see it, nohz_full is an extension to generic isolation. So maybe start with Kevin's patch, but augment with something else for the !NO_HZ_FULL case? Sure (hm, does it work without workqueue.disable_numa ?). [ /me returns from vacation ] Yes, since it happens for every alloc_workqueue_attrs() It just seems to me that tying it to sched domain construction would be a better fit. That way, it doesn't matter what your isolation requiring load is, whether you run a gaggle of realtime tasks or one HPC task your business, the generic requirement is isolation, not tick mode. For one HPC task per core, you want no tick, if you're running all SCHED_FIFO, maybe you want that too, depends on the impact of nohz_full mode. All sensitive loads want the isolation, but they may not like the price. I personally like the cpuset way. Being able to partition boxen on the fly makes them very flexible. In a perfect world, you'd be able to quiesce and configure offloading and nohz_full on the fly too, and not end up with some hodgepodge like this needs boot option foo, that happens invisibly because of config option bar, the other thing you have to do manually.. and you get to eat 937 kthreads and tons of overhead on all CPUs if you want the ability to _maybe_ run a critical task or two. Yeah, my patch only addresses the nohz_full case, but since there doesn't seem to be any general agreemenet about the generic case, it seems that exposing all unbound workqueues via WQ_SYSFS is the way to go. Mike, looks like you may have started on that. Did it get any further? Kevin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, 2014-02-24 at 09:55 -0800, Kevin Hilman wrote: Yeah, my patch only addresses the nohz_full case, but since there doesn't seem to be any general agreemenet about the generic case, it seems that exposing all unbound workqueues via WQ_SYSFS is the way to go. Mike, looks like you may have started on that. Did it get any further? No, work gets in the way of tinker time. Not sure which way I want to explore anyway. On the one hand, auto-quiesce thingy tied to domain construction/destruction sounds good, but on the other, just handing control to the user is highly attractive.. likely lots simpler too. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, 2014-02-17 at 05:50 +0100, Mike Galbraith wrote: > On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote: > > So maybe start with Kevin's patch, but augment with something else for > > the !NO_HZ_FULL case? > > Sure (hm, does it work without workqueue.disable_numa ?). I took patch out for a spin on a 40 core box +SMT, with CPUs 4-79 isolated via exclusive cpuset with load balancing off. Worker bees ignored patch either way. -Mike Perturbation measurement hog pinned to CPU4. With patch: # TASK-PID CPU# || TIMESTAMP FUNCTION # | | | || | | pert-9949 [004] 113 405.120164: workqueue_queue_work: work struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=4 pert-9949 [004] 113 405.120166: workqueue_activate_work: work struct 880a5c4ecc08 pert-9949 [004] d.L.313 405.120169: sched_wakeup: comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004 pert-9949 [004] d.Lh213 405.120170: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d.L.213 405.120172: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d...3.. 405.120173: sched_switch: prev_comm=pert prev_pid=9949 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:2 next_pid=2119 next_prio=120 kworker/4:2-2119 [004] 1.. 405.120174: workqueue_execute_start: work struct 880a5c4ecc08: function flush_to_ldisc kworker/4:2-2119 [004] d...311 405.120176: sched_wakeup: comm=sshd pid=6620 prio=120 success=1 target_cpu=000 kworker/4:2-2119 [004] 1.. 405.120176: workqueue_execute_end: work struct 880a5c4ecc08 kworker/4:2-2119 [004] d...3.. 405.120177: sched_switch: prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S ==> next_comm=pert next_pid=9949 next_prio=120 pert-9949 [004] 113 405.120178: workqueue_queue_work: work struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=4 pert-9949 [004] 113 405.120179: workqueue_activate_work: work struct 880a5c4ecc08 pert-9949 [004] d.L.313 405.120179: sched_wakeup: comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004 pert-9949 [004] d.L.213 405.120181: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d...3.. 405.120181: sched_switch: prev_comm=pert prev_pid=9949 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:2 next_pid=2119 next_prio=120 kworker/4:2-2119 [004] 1.. 405.120182: workqueue_execute_start: work struct 880a5c4ecc08: function flush_to_ldisc kworker/4:2-2119 [004] 1.. 405.120183: workqueue_execute_end: work struct 880a5c4ecc08 kworker/4:2-2119 [004] d...3.. 405.120183: sched_switch: prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S ==> next_comm=pert next_pid=9949 next_prio=120 pert-9949 [004] d...1.. 405.120736: tick_stop: success=yes msg= pert-9949 [004] 113 410.121082: workqueue_queue_work: work struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=4 pert-9949 [004] 113 410.121082: workqueue_activate_work: work struct 880a5c4ecc08 pert-9949 [004] d.L.313 410.121084: sched_wakeup: comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004 pert-9949 [004] d.Lh213 410.121085: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d.L.213 410.121087: tick_stop: success=no msg=more than 1 task in runqueue ...and so on until tick time (extra cheezy) hack kinda sorta works iff workqueue.disable_numa: --- kernel/workqueue.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1328,8 +1328,11 @@ static void __queue_work(int cpu, struct rcu_read_lock(); retry: - if (req_cpu == WORK_CPU_UNBOUND) + if (req_cpu == WORK_CPU_UNBOUND) { cpu = raw_smp_processor_id(); + if (runqueue_is_isolated(cpu)) + cpu = 0; + } /* pwq which will be used unless @work is executing elsewhere */ if (!(wq->flags & WQ_UNBOUND)) # TASK-PID CPU# || TIMESTAMP FUNCTION # | | | || | | <...>-33824 [004] 113 .889694: workqueue_queue_work: work struct=880a596eb008 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=0 <...>-33824 [004] 113 .889695: workqueue_activate_work: work struct 880a596eb008 <...>-33824 [004] d...313 .889697: sched_wakeup: comm=kworker/0:2 pid=2105 prio=120 success=1 target_cpu=000 <...>-33824 [004] 113 5560.890594:
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, 2014-02-17 at 05:50 +0100, Mike Galbraith wrote: On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote: So maybe start with Kevin's patch, but augment with something else for the !NO_HZ_FULL case? Sure (hm, does it work without workqueue.disable_numa ?). I took patch out for a spin on a 40 core box +SMT, with CPUs 4-79 isolated via exclusive cpuset with load balancing off. Worker bees ignored patch either way. -Mike Perturbation measurement hog pinned to CPU4. With patch: # TASK-PID CPU# || TIMESTAMP FUNCTION # | | | || | | pert-9949 [004] 113 405.120164: workqueue_queue_work: work struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=4 pert-9949 [004] 113 405.120166: workqueue_activate_work: work struct 880a5c4ecc08 pert-9949 [004] d.L.313 405.120169: sched_wakeup: comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004 pert-9949 [004] d.Lh213 405.120170: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d.L.213 405.120172: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d...3.. 405.120173: sched_switch: prev_comm=pert prev_pid=9949 prev_prio=120 prev_state=R+ == next_comm=kworker/4:2 next_pid=2119 next_prio=120 kworker/4:2-2119 [004] 1.. 405.120174: workqueue_execute_start: work struct 880a5c4ecc08: function flush_to_ldisc kworker/4:2-2119 [004] d...311 405.120176: sched_wakeup: comm=sshd pid=6620 prio=120 success=1 target_cpu=000 kworker/4:2-2119 [004] 1.. 405.120176: workqueue_execute_end: work struct 880a5c4ecc08 kworker/4:2-2119 [004] d...3.. 405.120177: sched_switch: prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S == next_comm=pert next_pid=9949 next_prio=120 pert-9949 [004] 113 405.120178: workqueue_queue_work: work struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=4 pert-9949 [004] 113 405.120179: workqueue_activate_work: work struct 880a5c4ecc08 pert-9949 [004] d.L.313 405.120179: sched_wakeup: comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004 pert-9949 [004] d.L.213 405.120181: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d...3.. 405.120181: sched_switch: prev_comm=pert prev_pid=9949 prev_prio=120 prev_state=R+ == next_comm=kworker/4:2 next_pid=2119 next_prio=120 kworker/4:2-2119 [004] 1.. 405.120182: workqueue_execute_start: work struct 880a5c4ecc08: function flush_to_ldisc kworker/4:2-2119 [004] 1.. 405.120183: workqueue_execute_end: work struct 880a5c4ecc08 kworker/4:2-2119 [004] d...3.. 405.120183: sched_switch: prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S == next_comm=pert next_pid=9949 next_prio=120 pert-9949 [004] d...1.. 405.120736: tick_stop: success=yes msg= pert-9949 [004] 113 410.121082: workqueue_queue_work: work struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=4 pert-9949 [004] 113 410.121082: workqueue_activate_work: work struct 880a5c4ecc08 pert-9949 [004] d.L.313 410.121084: sched_wakeup: comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004 pert-9949 [004] d.Lh213 410.121085: tick_stop: success=no msg=more than 1 task in runqueue pert-9949 [004] d.L.213 410.121087: tick_stop: success=no msg=more than 1 task in runqueue ...and so on until tick time (extra cheezy) hack kinda sorta works iff workqueue.disable_numa: --- kernel/workqueue.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1328,8 +1328,11 @@ static void __queue_work(int cpu, struct rcu_read_lock(); retry: - if (req_cpu == WORK_CPU_UNBOUND) + if (req_cpu == WORK_CPU_UNBOUND) { cpu = raw_smp_processor_id(); + if (runqueue_is_isolated(cpu)) + cpu = 0; + } /* pwq which will be used unless @work is executing elsewhere */ if (!(wq-flags WQ_UNBOUND)) # TASK-PID CPU# || TIMESTAMP FUNCTION # | | | || | | ...-33824 [004] 113 .889694: workqueue_queue_work: work struct=880a596eb008 function=flush_to_ldisc workqueue=88046f40ba00 req_cpu=256 cpu=0 ...-33824 [004] 113 .889695: workqueue_activate_work: work struct 880a596eb008 ...-33824 [004] d...313 .889697: sched_wakeup: comm=kworker/0:2 pid=2105 prio=120 success=1 target_cpu=000 ...-33824 [004] 113 5560.890594: workqueue_queue_work:
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: > On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: > > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: > > > Acked-by: Lai Jiangshan > > > > Thank you all, queued for 3.15. > > > > We should also have some facility for moving the SRCU workqueues to > > housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does > > this patch already have that effect? > > Kevin Hilman and me plan to try to bring a new Kconfig option that could let > us control the unbound workqueues affinity through sysfs. Handing control to the user seemed like a fine thing, so I started making a boot option to enable it. Forcing WQ_SYSFS on at sysfs decision spot doesn't go well, init order matters :) Post init frobbing required if you want to see/frob all unbound. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote: > So if there is NO_HZ_FULL, you have no objection to binding workqueues to > the timekeeping CPUs, but that you would also like some form of automatic > binding in the !NO_HZ_FULL case. Of course, if a common mechanism could > serve both cases, that would be good. And yes, cpusets are frowned upon > for some workloads. I'm not _objecting_, I'm not driving, Frederic's doing that ;-) That said, isolation seems to be turning into a property of nohz mode, but as I see it, nohz_full is an extension to generic isolation. > So maybe start with Kevin's patch, but augment with something else for > the !NO_HZ_FULL case? Sure (hm, does it work without workqueue.disable_numa ?). It just seems to me that tying it to sched domain construction would be a better fit. That way, it doesn't matter what your isolation requiring load is, whether you run a gaggle of realtime tasks or one HPC task your business, the generic requirement is isolation, not tick mode. For one HPC task per core, you want no tick, if you're running all SCHED_FIFO, maybe you want that too, depends on the impact of nohz_full mode. All sensitive loads want the isolation, but they may not like the price. I personally like the cpuset way. Being able to partition boxen on the fly makes them very flexible. In a perfect world, you'd be able to quiesce and configure offloading and nohz_full on the fly too, and not end up with some hodgepodge like this needs boot option foo, that happens invisibly because of config option bar, the other thing you have to do manually.. and you get to eat 937 kthreads and tons of overhead on all CPUs if you want the ability to _maybe_ run a critical task or two. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Sat, Feb 15, 2014 at 08:36:44AM +0100, Mike Galbraith wrote: > On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: > > Tejun Heo writes: > > > > > Hello, > > > > > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > > >> +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > > >> +to force the WQ_SYSFS workqueues to run on the specified set > > >> +of CPUs. The set of WQ_SYSFS workqueues can be displayed using > > >> +"ls sys/devices/virtual/workqueue". > > > > > > One thing to be careful about is that once published, it becomes part > > > of userland visible interface. Maybe adding some words warning > > > against sprinkling WQ_SYSFS willy-nilly is a good idea? > > > > In the NO_HZ_FULL case, it seems to me we'd always want all unbound > > workqueues to have their affinity set to the housekeeping CPUs. > > > > Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so > > the affinity can be controlled? I guess the main reason would be that > > all of these workqueue names would become permanent ABI. > > > > At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of > > unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS > > workqueues could still be overridden from userspace, but at least the > > default would be sane, and help keep full dyntics CPUs isolated. > > What I'm thinking is that it should be automatic, but not necessarily > based upon the nohz full mask, rather maybe based upon whether sched > domains exist, or perhaps a generic exclusive cpuset property, though > some really don't want anything to do with cpusets. > > Why? Because there are jitter intolerant loads where nohz full isn't all > that useful, because you'll be constantly restarting and stopping the > tick, and eating the increased accounting overhead to no gain because > there are frequently multiple realtime tasks running. For these loads > (I have a user with such a fairly hefty 80 core rt load), dynamically > turning the tick _on_ is currently a better choice than nohz_full. > Point being, control of where unbound workqueues are allowed to run > isn't only desirable for single task HPC loads, other loads exist. > > For my particular fairly critical 80 core load, workqueues aren't a real > big hairy deal, because its jitter tolerance isn't _all_ that tight (30 > us max is easy enough to meet with room to spare). The load can slice > through workers well enough to meet requirements, but it would certainly > be a win to be able to keep them at bay. (gonna measure it, less jitter > is better even if it's only a little bit better.. eventually somebody > will demand what's currently impossible to deliver) So if there is NO_HZ_FULL, you have no objection to binding workqueues to the timekeeping CPUs, but that you would also like some form of automatic binding in the !NO_HZ_FULL case. Of course, if a common mechanism could serve both cases, that would be good. And yes, cpusets are frowned upon for some workloads. So maybe start with Kevin's patch, but augment with something else for the !NO_HZ_FULL case? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Sat, Feb 15, 2014 at 08:36:44AM +0100, Mike Galbraith wrote: On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: Tejun Heo t...@kernel.org writes: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files +to force the WQ_SYSFS workqueues to run on the specified set +of CPUs. The set of WQ_SYSFS workqueues can be displayed using +ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? In the NO_HZ_FULL case, it seems to me we'd always want all unbound workqueues to have their affinity set to the housekeeping CPUs. Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so the affinity can be controlled? I guess the main reason would be that all of these workqueue names would become permanent ABI. At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS workqueues could still be overridden from userspace, but at least the default would be sane, and help keep full dyntics CPUs isolated. What I'm thinking is that it should be automatic, but not necessarily based upon the nohz full mask, rather maybe based upon whether sched domains exist, or perhaps a generic exclusive cpuset property, though some really don't want anything to do with cpusets. Why? Because there are jitter intolerant loads where nohz full isn't all that useful, because you'll be constantly restarting and stopping the tick, and eating the increased accounting overhead to no gain because there are frequently multiple realtime tasks running. For these loads (I have a user with such a fairly hefty 80 core rt load), dynamically turning the tick _on_ is currently a better choice than nohz_full. Point being, control of where unbound workqueues are allowed to run isn't only desirable for single task HPC loads, other loads exist. For my particular fairly critical 80 core load, workqueues aren't a real big hairy deal, because its jitter tolerance isn't _all_ that tight (30 us max is easy enough to meet with room to spare). The load can slice through workers well enough to meet requirements, but it would certainly be a win to be able to keep them at bay. (gonna measure it, less jitter is better even if it's only a little bit better.. eventually somebody will demand what's currently impossible to deliver) So if there is NO_HZ_FULL, you have no objection to binding workqueues to the timekeeping CPUs, but that you would also like some form of automatic binding in the !NO_HZ_FULL case. Of course, if a common mechanism could serve both cases, that would be good. And yes, cpusets are frowned upon for some workloads. So maybe start with Kevin's patch, but augment with something else for the !NO_HZ_FULL case? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote: So if there is NO_HZ_FULL, you have no objection to binding workqueues to the timekeeping CPUs, but that you would also like some form of automatic binding in the !NO_HZ_FULL case. Of course, if a common mechanism could serve both cases, that would be good. And yes, cpusets are frowned upon for some workloads. I'm not _objecting_, I'm not driving, Frederic's doing that ;-) That said, isolation seems to be turning into a property of nohz mode, but as I see it, nohz_full is an extension to generic isolation. So maybe start with Kevin's patch, but augment with something else for the !NO_HZ_FULL case? Sure (hm, does it work without workqueue.disable_numa ?). It just seems to me that tying it to sched domain construction would be a better fit. That way, it doesn't matter what your isolation requiring load is, whether you run a gaggle of realtime tasks or one HPC task your business, the generic requirement is isolation, not tick mode. For one HPC task per core, you want no tick, if you're running all SCHED_FIFO, maybe you want that too, depends on the impact of nohz_full mode. All sensitive loads want the isolation, but they may not like the price. I personally like the cpuset way. Being able to partition boxen on the fly makes them very flexible. In a perfect world, you'd be able to quiesce and configure offloading and nohz_full on the fly too, and not end up with some hodgepodge like this needs boot option foo, that happens invisibly because of config option bar, the other thing you have to do manually.. and you get to eat 937 kthreads and tons of overhead on all CPUs if you want the ability to _maybe_ run a critical task or two. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: Acked-by: Lai Jiangshan la...@cn.fujitsu.com Thank you all, queued for 3.15. We should also have some facility for moving the SRCU workqueues to housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does this patch already have that effect? Kevin Hilman and me plan to try to bring a new Kconfig option that could let us control the unbound workqueues affinity through sysfs. Handing control to the user seemed like a fine thing, so I started making a boot option to enable it. Forcing WQ_SYSFS on at sysfs decision spot doesn't go well, init order matters :) Post init frobbing required if you want to see/frob all unbound. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: > Tejun Heo writes: > > > Hello, > > > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > >> +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > >> + to force the WQ_SYSFS workqueues to run on the specified set > >> + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > >> + "ls sys/devices/virtual/workqueue". > > > > One thing to be careful about is that once published, it becomes part > > of userland visible interface. Maybe adding some words warning > > against sprinkling WQ_SYSFS willy-nilly is a good idea? > > In the NO_HZ_FULL case, it seems to me we'd always want all unbound > workqueues to have their affinity set to the housekeeping CPUs. > > Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so > the affinity can be controlled? I guess the main reason would be that > all of these workqueue names would become permanent ABI. > > At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of > unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS > workqueues could still be overridden from userspace, but at least the > default would be sane, and help keep full dyntics CPUs isolated. What I'm thinking is that it should be automatic, but not necessarily based upon the nohz full mask, rather maybe based upon whether sched domains exist, or perhaps a generic exclusive cpuset property, though some really don't want anything to do with cpusets. Why? Because there are jitter intolerant loads where nohz full isn't all that useful, because you'll be constantly restarting and stopping the tick, and eating the increased accounting overhead to no gain because there are frequently multiple realtime tasks running. For these loads (I have a user with such a fairly hefty 80 core rt load), dynamically turning the tick _on_ is currently a better choice than nohz_full. Point being, control of where unbound workqueues are allowed to run isn't only desirable for single task HPC loads, other loads exist. For my particular fairly critical 80 core load, workqueues aren't a real big hairy deal, because its jitter tolerance isn't _all_ that tight (30 us max is easy enough to meet with room to spare). The load can slice through workers well enough to meet requirements, but it would certainly be a win to be able to keep them at bay. (gonna measure it, less jitter is better even if it's only a little bit better.. eventually somebody will demand what's currently impossible to deliver) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Tejun Heo writes: > Hello, > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: >> +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files >> +to force the WQ_SYSFS workqueues to run on the specified set >> +of CPUs. The set of WQ_SYSFS workqueues can be displayed using >> +"ls sys/devices/virtual/workqueue". > > One thing to be careful about is that once published, it becomes part > of userland visible interface. Maybe adding some words warning > against sprinkling WQ_SYSFS willy-nilly is a good idea? In the NO_HZ_FULL case, it seems to me we'd always want all unbound workqueues to have their affinity set to the housekeeping CPUs. Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so the affinity can be controlled? I guess the main reason would be that all of these workqueue names would become permanent ABI. At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS workqueues could still be overridden from userspace, but at least the default would be sane, and help keep full dyntics CPUs isolated. Example patch below, only boot tested on 4-CPU ARM system with CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat /sys/devices/virtual/workqueue/writeback/cpumask' looked sane. If this looks OK, I can maybe clean it up a bit and make it runtime check instead of a compile time check. Kevin >From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001 From: Kevin Hilman Date: Fri, 14 Feb 2014 15:10:58 -0800 Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to !tick_nohz_full_mask To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues from running on full dynticks CPUs. To do this, set the default workqueue cpumask to be the set of "housekeeping" CPUs instead of all possible CPUs. This is just just the starting/default cpumask, and can be overridden with all the normal ways (NUMA settings, apply_workqueue_attrs and via sysfs for workqueus with the WQ_SYSFS attribute.) Cc: Tejun Heo Cc: Paul McKenney Cc: Frederic Weisbecker Signed-off-by: Kevin Hilman --- kernel/workqueue.c | 5 + 1 file changed, 5 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 987293d03ebc..9a9d9b0eaf6d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "workqueue_internal.h" @@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask) if (!alloc_cpumask_var(>cpumask, gfp_mask)) goto fail; +#ifdef CONFIG_NO_HZ_FULL + cpumask_complement(attrs->cpumask, tick_nohz_full_mask); +#else cpumask_copy(attrs->cpumask, cpu_possible_mask); +#endif return attrs; fail: free_workqueue_attrs(attrs); -- 1.8.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Tejun Heo t...@kernel.org writes: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files +to force the WQ_SYSFS workqueues to run on the specified set +of CPUs. The set of WQ_SYSFS workqueues can be displayed using +ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? In the NO_HZ_FULL case, it seems to me we'd always want all unbound workqueues to have their affinity set to the housekeeping CPUs. Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so the affinity can be controlled? I guess the main reason would be that all of these workqueue names would become permanent ABI. At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS workqueues could still be overridden from userspace, but at least the default would be sane, and help keep full dyntics CPUs isolated. Example patch below, only boot tested on 4-CPU ARM system with CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat /sys/devices/virtual/workqueue/writeback/cpumask' looked sane. If this looks OK, I can maybe clean it up a bit and make it runtime check instead of a compile time check. Kevin From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001 From: Kevin Hilman khil...@linaro.org Date: Fri, 14 Feb 2014 15:10:58 -0800 Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to !tick_nohz_full_mask To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues from running on full dynticks CPUs. To do this, set the default workqueue cpumask to be the set of housekeeping CPUs instead of all possible CPUs. This is just just the starting/default cpumask, and can be overridden with all the normal ways (NUMA settings, apply_workqueue_attrs and via sysfs for workqueus with the WQ_SYSFS attribute.) Cc: Tejun Heo t...@kernel.org Cc: Paul McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Signed-off-by: Kevin Hilman khil...@linaro.org --- kernel/workqueue.c | 5 + 1 file changed, 5 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 987293d03ebc..9a9d9b0eaf6d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -48,6 +48,7 @@ #include linux/nodemask.h #include linux/moduleparam.h #include linux/uaccess.h +#include linux/tick.h #include workqueue_internal.h @@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask) if (!alloc_cpumask_var(attrs-cpumask, gfp_mask)) goto fail; +#ifdef CONFIG_NO_HZ_FULL + cpumask_complement(attrs-cpumask, tick_nohz_full_mask); +#else cpumask_copy(attrs-cpumask, cpu_possible_mask); +#endif return attrs; fail: free_workqueue_attrs(attrs); -- 1.8.3 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: Tejun Heo t...@kernel.org writes: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? In the NO_HZ_FULL case, it seems to me we'd always want all unbound workqueues to have their affinity set to the housekeeping CPUs. Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so the affinity can be controlled? I guess the main reason would be that all of these workqueue names would become permanent ABI. At least for NO_HZ_FULL, maybe this should be automatic. The cpumask of unbound workqueues should default to !tick_nohz_full_mask? Any WQ_SYSFS workqueues could still be overridden from userspace, but at least the default would be sane, and help keep full dyntics CPUs isolated. What I'm thinking is that it should be automatic, but not necessarily based upon the nohz full mask, rather maybe based upon whether sched domains exist, or perhaps a generic exclusive cpuset property, though some really don't want anything to do with cpusets. Why? Because there are jitter intolerant loads where nohz full isn't all that useful, because you'll be constantly restarting and stopping the tick, and eating the increased accounting overhead to no gain because there are frequently multiple realtime tasks running. For these loads (I have a user with such a fairly hefty 80 core rt load), dynamically turning the tick _on_ is currently a better choice than nohz_full. Point being, control of where unbound workqueues are allowed to run isn't only desirable for single task HPC loads, other loads exist. For my particular fairly critical 80 core load, workqueues aren't a real big hairy deal, because its jitter tolerance isn't _all_ that tight (30 us max is easy enough to meet with room to spare). The load can slice through workers well enough to meet requirements, but it would certainly be a win to be able to keep them at bay. (gonna measure it, less jitter is better even if it's only a little bit better.. eventually somebody will demand what's currently impossible to deliver) -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 04:33:11PM -0800, Paul E. McKenney wrote: > > Fair point! I wordsmithed it into the following. Seem reasonable? > > Thanx, Paul > > > > Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity > > This commit documents the ability to apply CPU affinity to WQ_SYSFS > workqueues, thus offloading them from the desired worker CPUs. > > Signed-off-by: Paul E. McKenney > Reviewed-by: Tejun Heo > Acked-by: Frederic Weisbecker > > diff --git a/Documentation/kernel-per-CPU-kthreads.txt > b/Documentation/kernel-per-CPU-kthreads.txt > index 827104fb9364..f3cd299fcc41 100644 > --- a/Documentation/kernel-per-CPU-kthreads.txt > +++ b/Documentation/kernel-per-CPU-kthreads.txt > @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests > To reduce its OS jitter, do any of the following: > 1. Run your workload at a real-time priority, which will allow > preempting the kworker daemons. > -2. Do any of the following needed to avoid jitter that your > +2. A given workqueue can be made visible in the sysfs filesystem > + by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). > + Such a workqueue can be confined to a given subset of the > + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs > + files. The set of WQ_SYSFS workqueues can be displayed using > + "ls sys/devices/virtual/workqueue". That said, the workqueues > + maintainer would like to caution people against indiscriminately > + sprinkling WQ_SYSFS across all the workqueues. The reason for > + caution is that it is easy to add WQ_SYSFS, but because sysfs is > + part of the formal user/kernel API, it can be nearly impossible > + to remove it, even if its addition was a mistake. > +3. Do any of the following needed to avoid jitter that your > application cannot tolerate: > a. Build your kernel with CONFIG_SLUB=y rather than > CONFIG_SLAB=y, thus avoiding the slab allocator's periodic > Perfect!! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 04:33:11PM -0800, Paul E. McKenney wrote: Fair point! I wordsmithed it into the following. Seem reasonable? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Reviewed-by: Tejun Heo t...@kernel.org Acked-by: Frederic Weisbecker fweis...@gmail.com diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..f3cd299fcc41 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. A given workqueue can be made visible in the sysfs filesystem + by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). + Such a workqueue can be confined to a given subset of the + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs + files. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs is + part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic Perfect!! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On 02/13/2014 08:33 AM, Paul E. McKenney wrote: > On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote: >> On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: >>> On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > + to force the WQ_SYSFS workqueues to run on the specified set > + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > + "ls sys/devices/virtual/workqueue". One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? >>> >>> Good point! How about the following? >>> >>> Thanx, Paul >>> >>> >>> >>> Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity >>> >>> This commit documents the ability to apply CPU affinity to WQ_SYSFS >>> workqueues, thus offloading them from the desired worker CPUs. >>> >>> Signed-off-by: Paul E. McKenney >>> Cc: Frederic Weisbecker >>> Cc: Tejun Heo >>> >>> diff --git a/Documentation/kernel-per-CPU-kthreads.txt >>> b/Documentation/kernel-per-CPU-kthreads.txt >>> index 827104fb9364..214da3a47a68 100644 >>> --- a/Documentation/kernel-per-CPU-kthreads.txt >>> +++ b/Documentation/kernel-per-CPU-kthreads.txt >>> @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests >>> To reduce its OS jitter, do any of the following: >>> 1. Run your workload at a real-time priority, which will allow >>> preempting the kworker daemons. >>> -2. Do any of the following needed to avoid jitter that your >>> +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files >>> + to force the WQ_SYSFS workqueues to run on the specified set >>> + of CPUs. The set of WQ_SYSFS workqueues can be displayed using >>> + "ls sys/devices/virtual/workqueue". That said, the workqueues >>> + maintainer would like to caution people against indiscriminately >>> + sprinkling WQ_SYSFS across all the workqueues. The reason for >>> + caution is that it is easy to add WQ_SYSFS, but because sysfs >>> + is part of the formal user/kernel API, it can be nearly impossible >>> + to remove it, even if its addition was a mistake. >>> +3. Do any of the following needed to avoid jitter that your >> >> Acked-by: Frederic Weisbecker >> >> I just suggest we append a small explanation about what WQ_SYSFS is about. >> Like: > > Fair point! I wordsmithed it into the following. Seem reasonable? > > Thanx, Paul > > > > Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity > > This commit documents the ability to apply CPU affinity to WQ_SYSFS > workqueues, thus offloading them from the desired worker CPUs. > > Signed-off-by: Paul E. McKenney > Reviewed-by: Tejun Heo > Acked-by: Frederic Weisbecker > > diff --git a/Documentation/kernel-per-CPU-kthreads.txt > b/Documentation/kernel-per-CPU-kthreads.txt > index 827104fb9364..f3cd299fcc41 100644 > --- a/Documentation/kernel-per-CPU-kthreads.txt > +++ b/Documentation/kernel-per-CPU-kthreads.txt > @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests > To reduce its OS jitter, do any of the following: > 1. Run your workload at a real-time priority, which will allow > preempting the kworker daemons. > -2. Do any of the following needed to avoid jitter that your > +2. A given workqueue can be made visible in the sysfs filesystem > + by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). > + Such a workqueue can be confined to a given subset of the > + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs > + files. The set of WQ_SYSFS workqueues can be displayed using > + "ls sys/devices/virtual/workqueue". That said, the workqueues > + maintainer would like to caution people against indiscriminately > + sprinkling WQ_SYSFS across all the workqueues. The reason for > + caution is that it is easy to add WQ_SYSFS, but because sysfs is > + part of the formal user/kernel API, it can be nearly impossible > + to remove it, even if its addition was a mistake. > +3. Do any of the following needed to avoid jitter that your > application cannot tolerate: > a. Build your kernel with CONFIG_SLUB=y rather than > CONFIG_SLAB=y, thus avoiding the slab allocator's periodic > > Reviewed-by: Lai Jiangshan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote: > On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: > > On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: > > > Hello, > > > > > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > > > > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > > > > + to force the WQ_SYSFS workqueues to run on the specified set > > > > + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > > > > + "ls sys/devices/virtual/workqueue". > > > > > > One thing to be careful about is that once published, it becomes part > > > of userland visible interface. Maybe adding some words warning > > > against sprinkling WQ_SYSFS willy-nilly is a good idea? > > > > Good point! How about the following? > > > > Thanx, Paul > > > > > > > > Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity > > > > This commit documents the ability to apply CPU affinity to WQ_SYSFS > > workqueues, thus offloading them from the desired worker CPUs. > > > > Signed-off-by: Paul E. McKenney > > Cc: Frederic Weisbecker > > Cc: Tejun Heo > > > > diff --git a/Documentation/kernel-per-CPU-kthreads.txt > > b/Documentation/kernel-per-CPU-kthreads.txt > > index 827104fb9364..214da3a47a68 100644 > > --- a/Documentation/kernel-per-CPU-kthreads.txt > > +++ b/Documentation/kernel-per-CPU-kthreads.txt > > @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests > > To reduce its OS jitter, do any of the following: > > 1. Run your workload at a real-time priority, which will allow > > preempting the kworker daemons. > > -2. Do any of the following needed to avoid jitter that your > > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > > + to force the WQ_SYSFS workqueues to run on the specified set > > + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > > + "ls sys/devices/virtual/workqueue". That said, the workqueues > > + maintainer would like to caution people against indiscriminately > > + sprinkling WQ_SYSFS across all the workqueues. The reason for > > + caution is that it is easy to add WQ_SYSFS, but because sysfs > > + is part of the formal user/kernel API, it can be nearly impossible > > + to remove it, even if its addition was a mistake. > > +3. Do any of the following needed to avoid jitter that your > > Acked-by: Frederic Weisbecker > > I just suggest we append a small explanation about what WQ_SYSFS is about. > Like: Fair point! I wordsmithed it into the following. Seem reasonable? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney Reviewed-by: Tejun Heo Acked-by: Frederic Weisbecker diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..f3cd299fcc41 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. A given workqueue can be made visible in the sysfs filesystem + by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). + Such a workqueue can be confined to a given subset of the + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs + files. The set of WQ_SYSFS workqueues can be displayed using + "ls sys/devices/virtual/workqueue". That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs is + part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: > On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: > > Hello, > > > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > > > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > > > + to force the WQ_SYSFS workqueues to run on the specified set > > > + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > > > + "ls sys/devices/virtual/workqueue". > > > > One thing to be careful about is that once published, it becomes part > > of userland visible interface. Maybe adding some words warning > > against sprinkling WQ_SYSFS willy-nilly is a good idea? > > Good point! How about the following? > > Thanx, Paul > > > > Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity > > This commit documents the ability to apply CPU affinity to WQ_SYSFS > workqueues, thus offloading them from the desired worker CPUs. > > Signed-off-by: Paul E. McKenney > Cc: Frederic Weisbecker > Cc: Tejun Heo > > diff --git a/Documentation/kernel-per-CPU-kthreads.txt > b/Documentation/kernel-per-CPU-kthreads.txt > index 827104fb9364..214da3a47a68 100644 > --- a/Documentation/kernel-per-CPU-kthreads.txt > +++ b/Documentation/kernel-per-CPU-kthreads.txt > @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests > To reduce its OS jitter, do any of the following: > 1. Run your workload at a real-time priority, which will allow > preempting the kworker daemons. > -2. Do any of the following needed to avoid jitter that your > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > + to force the WQ_SYSFS workqueues to run on the specified set > + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > + "ls sys/devices/virtual/workqueue". That said, the workqueues > + maintainer would like to caution people against indiscriminately > + sprinkling WQ_SYSFS across all the workqueues. The reason for > + caution is that it is easy to add WQ_SYSFS, but because sysfs > + is part of the formal user/kernel API, it can be nearly impossible > + to remove it, even if its addition was a mistake. > +3. Do any of the following needed to avoid jitter that your Acked-by: Frederic Weisbecker I just suggest we append a small explanation about what WQ_SYSFS is about. Like: +2. + The workqueues that want to be visible on the sysfs hierarchy + set the WQ_SYSFS flag. + For those who have this flag set, you can use the + /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + "ls sys/devices/virtual/workqueue". That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs + is part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: > Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity > > This commit documents the ability to apply CPU affinity to WQ_SYSFS > workqueues, thus offloading them from the desired worker CPUs. > > Signed-off-by: Paul E. McKenney > Cc: Frederic Weisbecker > Cc: Tejun Heo Reviewed-by: Tejun Heo Thanks! -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: > Hello, > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > > + to force the WQ_SYSFS workqueues to run on the specified set > > + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > > + "ls sys/devices/virtual/workqueue". > > One thing to be careful about is that once published, it becomes part > of userland visible interface. Maybe adding some words warning > against sprinkling WQ_SYSFS willy-nilly is a good idea? Good point! How about the following? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney Cc: Frederic Weisbecker Cc: Tejun Heo diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..214da3a47a68 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + "ls sys/devices/virtual/workqueue". That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs + is part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 07:23:38PM +0100, Frederic Weisbecker wrote: > On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: > > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: > > > Acked-by: Lai Jiangshan > > > > Thank you all, queued for 3.15. > > > > We should also have some facility for moving the SRCU workqueues to > > housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does > > this patch already have that effect? > > Kevin Hilman and me plan to try to bring a new Kconfig option that could let > us control the unbound workqueues affinity through sysfs. Please CC me or feel free to update Documentation/kernel-per-CPU-kthreads.txt as part of this upcoming series. > The feature actually exist currently but is only enabled for workqueues that > have WQ_SYSFS. Writeback and raid5 are the only current users. > > See for example: /sys/devices/virtual/workqueue/writeback/cpumask Ah, news to me! I have queued the following patch, seem reasonable? Thanx, Paul diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..09f28841ee3e 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,11 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + "ls sys/devices/virtual/workqueue". +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files > + to force the WQ_SYSFS workqueues to run on the specified set > + of CPUs. The set of WQ_SYSFS workqueues can be displayed using > + "ls sys/devices/virtual/workqueue". One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: > > Acked-by: Lai Jiangshan > > Thank you all, queued for 3.15. > > We should also have some facility for moving the SRCU workqueues to > housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does > this patch already have that effect? Kevin Hilman and me plan to try to bring a new Kconfig option that could let us control the unbound workqueues affinity through sysfs. The feature actually exist currently but is only enabled for workqueues that have WQ_SYSFS. Writeback and raid5 are the only current users. See for example: /sys/devices/virtual/workqueue/writeback/cpumask -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: Acked-by: Lai Jiangshan la...@cn.fujitsu.com Thank you all, queued for 3.15. We should also have some facility for moving the SRCU workqueues to housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does this patch already have that effect? Kevin Hilman and me plan to try to bring a new Kconfig option that could let us control the unbound workqueues affinity through sysfs. The feature actually exist currently but is only enabled for workqueues that have WQ_SYSFS. Writeback and raid5 are the only current users. See for example: /sys/devices/virtual/workqueue/writeback/cpumask -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 07:23:38PM +0100, Frederic Weisbecker wrote: On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote: On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: Acked-by: Lai Jiangshan la...@cn.fujitsu.com Thank you all, queued for 3.15. We should also have some facility for moving the SRCU workqueues to housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does this patch already have that effect? Kevin Hilman and me plan to try to bring a new Kconfig option that could let us control the unbound workqueues affinity through sysfs. Please CC me or feel free to update Documentation/kernel-per-CPU-kthreads.txt as part of this upcoming series. The feature actually exist currently but is only enabled for workqueues that have WQ_SYSFS. Writeback and raid5 are the only current users. See for example: /sys/devices/virtual/workqueue/writeback/cpumask Ah, news to me! I have queued the following patch, seem reasonable? Thanx, Paul diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..09f28841ee3e 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,11 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? Good point! How about the following? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tejun Heo t...@kernel.org diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..214da3a47a68 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs + is part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tejun Heo t...@kernel.org Reviewed-by: Tejun Heo t...@kernel.org Thanks! -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? Good point! How about the following? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tejun Heo t...@kernel.org diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..214da3a47a68 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs + is part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your Acked-by: Frederic Weisbecker fweis...@gmail.com I just suggest we append a small explanation about what WQ_SYSFS is about. Like: +2. + The workqueues that want to be visible on the sysfs hierarchy + set the WQ_SYSFS flag. + For those who have this flag set, you can use the + /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs + is part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote: On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? Good point! How about the following? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tejun Heo t...@kernel.org diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..214da3a47a68 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs + is part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your Acked-by: Frederic Weisbecker fweis...@gmail.com I just suggest we append a small explanation about what WQ_SYSFS is about. Like: Fair point! I wordsmithed it into the following. Seem reasonable? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Reviewed-by: Tejun Heo t...@kernel.org Acked-by: Frederic Weisbecker fweis...@gmail.com diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..f3cd299fcc41 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. A given workqueue can be made visible in the sysfs filesystem + by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). + Such a workqueue can be confined to a given subset of the + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs + files. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs is + part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On 02/13/2014 08:33 AM, Paul E. McKenney wrote: On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote: On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote: On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote: Hello, On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote: +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. One thing to be careful about is that once published, it becomes part of userland visible interface. Maybe adding some words warning against sprinkling WQ_SYSFS willy-nilly is a good idea? Good point! How about the following? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tejun Heo t...@kernel.org diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..214da3a47a68 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files + to force the WQ_SYSFS workqueues to run on the specified set + of CPUs. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs + is part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your Acked-by: Frederic Weisbecker fweis...@gmail.com I just suggest we append a small explanation about what WQ_SYSFS is about. Like: Fair point! I wordsmithed it into the following. Seem reasonable? Thanx, Paul Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity This commit documents the ability to apply CPU affinity to WQ_SYSFS workqueues, thus offloading them from the desired worker CPUs. Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com Reviewed-by: Tejun Heo t...@kernel.org Acked-by: Frederic Weisbecker fweis...@gmail.com diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 827104fb9364..f3cd299fcc41 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests To reduce its OS jitter, do any of the following: 1. Run your workload at a real-time priority, which will allow preempting the kworker daemons. -2. Do any of the following needed to avoid jitter that your +2. A given workqueue can be made visible in the sysfs filesystem + by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). + Such a workqueue can be confined to a given subset of the + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs + files. The set of WQ_SYSFS workqueues can be displayed using + ls sys/devices/virtual/workqueue. That said, the workqueues + maintainer would like to caution people against indiscriminately + sprinkling WQ_SYSFS across all the workqueues. The reason for + caution is that it is easy to add WQ_SYSFS, but because sysfs is + part of the formal user/kernel API, it can be nearly impossible + to remove it, even if its addition was a mistake. +3. Do any of the following needed to avoid jitter that your application cannot tolerate: a. Build your kernel with CONFIG_SLUB=y rather than CONFIG_SLAB=y, thus avoiding the slab allocator's periodic Reviewed-by: Lai Jiangshan la...@cn.fujitsu.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: > Acked-by: Lai Jiangshan Thank you all, queued for 3.15. We should also have some facility for moving the SRCU workqueues to housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does this patch already have that effect? Thanx, Paul > On 02/01/2014 03:53 AM, Zoran Markovic wrote: > > From: Shaibal Dutta > > > > For better use of CPU idle time, allow the scheduler to select the CPU > > on which the SRCU grace period work would be scheduled. This improves > > idle residency time and conserves power. > > > > This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. > > > > Cc: Lai Jiangshan > > Cc: "Paul E. McKenney" > > Cc: Dipankar Sarma > > Signed-off-by: Shaibal Dutta > > [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit > > message. Fixed code alignment.] > > --- > > kernel/rcu/srcu.c |5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c > > index 3318d82..a1ebe6d 100644 > > --- a/kernel/rcu/srcu.c > > +++ b/kernel/rcu/srcu.c > > @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head > > *head, > > rcu_batch_queue(>batch_queue, head); > > if (!sp->running) { > > sp->running = true; > > - schedule_delayed_work(>work, 0); > > + queue_delayed_work(system_power_efficient_wq, >work, 0); > > } > > spin_unlock_irqrestore(>queue_lock, flags); > > } > > @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp) > > } > > > > if (pending) > > - schedule_delayed_work(>work, SRCU_INTERVAL); > > + queue_delayed_work(system_power_efficient_wq, > > + >work, SRCU_INTERVAL); > > } > > > > /* > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Acked-by: Lai Jiangshan On 02/01/2014 03:53 AM, Zoran Markovic wrote: > From: Shaibal Dutta > > For better use of CPU idle time, allow the scheduler to select the CPU > on which the SRCU grace period work would be scheduled. This improves > idle residency time and conserves power. > > This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. > > Cc: Lai Jiangshan > Cc: "Paul E. McKenney" > Cc: Dipankar Sarma > Signed-off-by: Shaibal Dutta > [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit > message. Fixed code alignment.] > --- > kernel/rcu/srcu.c |5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c > index 3318d82..a1ebe6d 100644 > --- a/kernel/rcu/srcu.c > +++ b/kernel/rcu/srcu.c > @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head > *head, > rcu_batch_queue(>batch_queue, head); > if (!sp->running) { > sp->running = true; > - schedule_delayed_work(>work, 0); > + queue_delayed_work(system_power_efficient_wq, >work, 0); > } > spin_unlock_irqrestore(>queue_lock, flags); > } > @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp) > } > > if (pending) > - schedule_delayed_work(>work, SRCU_INTERVAL); > + queue_delayed_work(system_power_efficient_wq, > +>work, SRCU_INTERVAL); > } > > /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Acked-by: Lai Jiangshan la...@cn.fujitsu.com On 02/01/2014 03:53 AM, Zoran Markovic wrote: From: Shaibal Dutta shaibal.du...@broadcom.com For better use of CPU idle time, allow the scheduler to select the CPU on which the SRCU grace period work would be scheduled. This improves idle residency time and conserves power. This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Dipankar Sarma dipan...@in.ibm.com Signed-off-by: Shaibal Dutta shaibal.du...@broadcom.com [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit message. Fixed code alignment.] --- kernel/rcu/srcu.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index 3318d82..a1ebe6d 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head *head, rcu_batch_queue(sp-batch_queue, head); if (!sp-running) { sp-running = true; - schedule_delayed_work(sp-work, 0); + queue_delayed_work(system_power_efficient_wq, sp-work, 0); } spin_unlock_irqrestore(sp-queue_lock, flags); } @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp) } if (pending) - schedule_delayed_work(sp-work, SRCU_INTERVAL); + queue_delayed_work(system_power_efficient_wq, +sp-work, SRCU_INTERVAL); } /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote: Acked-by: Lai Jiangshan la...@cn.fujitsu.com Thank you all, queued for 3.15. We should also have some facility for moving the SRCU workqueues to housekeeping/timekeeping kthreads in the NO_HZ_FULL case. Or does this patch already have that effect? Thanx, Paul On 02/01/2014 03:53 AM, Zoran Markovic wrote: From: Shaibal Dutta shaibal.du...@broadcom.com For better use of CPU idle time, allow the scheduler to select the CPU on which the SRCU grace period work would be scheduled. This improves idle residency time and conserves power. This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Dipankar Sarma dipan...@in.ibm.com Signed-off-by: Shaibal Dutta shaibal.du...@broadcom.com [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit message. Fixed code alignment.] --- kernel/rcu/srcu.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index 3318d82..a1ebe6d 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head *head, rcu_batch_queue(sp-batch_queue, head); if (!sp-running) { sp-running = true; - schedule_delayed_work(sp-work, 0); + queue_delayed_work(system_power_efficient_wq, sp-work, 0); } spin_unlock_irqrestore(sp-queue_lock, flags); } @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp) } if (pending) - schedule_delayed_work(sp-work, SRCU_INTERVAL); + queue_delayed_work(system_power_efficient_wq, + sp-work, SRCU_INTERVAL); } /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Signed-off-by: Zoran Markovic On 31 January 2014 11:53, Zoran Markovic wrote: > From: Shaibal Dutta > > For better use of CPU idle time, allow the scheduler to select the CPU > on which the SRCU grace period work would be scheduled. This improves > idle residency time and conserves power. > > This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. > > Cc: Lai Jiangshan > Cc: "Paul E. McKenney" > Cc: Dipankar Sarma > Signed-off-by: Shaibal Dutta > [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit > message. Fixed code alignment.] > --- > kernel/rcu/srcu.c |5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c > index 3318d82..a1ebe6d 100644 > --- a/kernel/rcu/srcu.c > +++ b/kernel/rcu/srcu.c > @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head > *head, > rcu_batch_queue(>batch_queue, head); > if (!sp->running) { > sp->running = true; > - schedule_delayed_work(>work, 0); > + queue_delayed_work(system_power_efficient_wq, >work, 0); > } > spin_unlock_irqrestore(>queue_lock, flags); > } > @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp) > } > > if (pending) > - schedule_delayed_work(>work, SRCU_INTERVAL); > + queue_delayed_work(system_power_efficient_wq, > + >work, SRCU_INTERVAL); > } > > /* > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue
Signed-off-by: Zoran Markovic zoran.marko...@linaro.org On 31 January 2014 11:53, Zoran Markovic zoran.marko...@linaro.org wrote: From: Shaibal Dutta shaibal.du...@broadcom.com For better use of CPU idle time, allow the scheduler to select the CPU on which the SRCU grace period work would be scheduled. This improves idle residency time and conserves power. This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Dipankar Sarma dipan...@in.ibm.com Signed-off-by: Shaibal Dutta shaibal.du...@broadcom.com [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit message. Fixed code alignment.] --- kernel/rcu/srcu.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index 3318d82..a1ebe6d 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head *head, rcu_batch_queue(sp-batch_queue, head); if (!sp-running) { sp-running = true; - schedule_delayed_work(sp-work, 0); + queue_delayed_work(system_power_efficient_wq, sp-work, 0); } spin_unlock_irqrestore(sp-queue_lock, flags); } @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp) } if (pending) - schedule_delayed_work(sp-work, SRCU_INTERVAL); + queue_delayed_work(system_power_efficient_wq, + sp-work, SRCU_INTERVAL); } /* -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/