subject:"Re\: \[RFC PATCH\] rcu\: move SRCU grace period work to power efficient workqueue"

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-03-10 Thread Viresh Kumar

On Sat, Feb 15, 2014 at 7:24 AM, Kevin Hilman  wrote:
> From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001
> From: Kevin Hilman 
> Date: Fri, 14 Feb 2014 15:10:58 -0800
> Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to
>  !tick_nohz_full_mask
>
> To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues
> from running on full dynticks CPUs.  To do this, set the default
> workqueue cpumask to be the set of "housekeeping" CPUs instead of all
> possible CPUs.
>
> This is just just the starting/default cpumask, and can be overridden
> with all the normal ways (NUMA settings, apply_workqueue_attrs and via
> sysfs for workqueus with the WQ_SYSFS attribute.)
>
> Cc: Tejun Heo 
> Cc: Paul McKenney 
> Cc: Frederic Weisbecker 
> Signed-off-by: Kevin Hilman 
> ---
>  kernel/workqueue.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 987293d03ebc..9a9d9b0eaf6d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -48,6 +48,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "workqueue_internal.h"
>
> @@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t 
> gfp_mask)
> if (!alloc_cpumask_var(>cpumask, gfp_mask))
> goto fail;
>
> +#ifdef CONFIG_NO_HZ_FULL
> +   cpumask_complement(attrs->cpumask, tick_nohz_full_mask);
> +#else
> cpumask_copy(attrs->cpumask, cpu_possible_mask);
> +#endif
> return attrs;
>  fail:
> free_workqueue_attrs(attrs);

Can we play with this mask at runtime? I thought a better idea would be
to keep this mask as mask of all CPUs initially and once any CPU enters
NO_HZ_FULL mode we can remove that from mask? And ones it leaves
that mode we can get that added again..

I am looking to use similar concept in case of un-pinned timers with my
activity around cpuset.quiesce option..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-03-10 Thread Viresh Kumar

On Sat, Feb 15, 2014 at 7:24 AM, Kevin Hilman khil...@linaro.org wrote:
 From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001
 From: Kevin Hilman khil...@linaro.org
 Date: Fri, 14 Feb 2014 15:10:58 -0800
 Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to
  !tick_nohz_full_mask

 To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues
 from running on full dynticks CPUs.  To do this, set the default
 workqueue cpumask to be the set of housekeeping CPUs instead of all
 possible CPUs.

 This is just just the starting/default cpumask, and can be overridden
 with all the normal ways (NUMA settings, apply_workqueue_attrs and via
 sysfs for workqueus with the WQ_SYSFS attribute.)

 Cc: Tejun Heo t...@kernel.org
 Cc: Paul McKenney paul...@linux.vnet.ibm.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Signed-off-by: Kevin Hilman khil...@linaro.org
 ---
  kernel/workqueue.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
 index 987293d03ebc..9a9d9b0eaf6d 100644
 --- a/kernel/workqueue.c
 +++ b/kernel/workqueue.c
 @@ -48,6 +48,7 @@
  #include linux/nodemask.h
  #include linux/moduleparam.h
  #include linux/uaccess.h
 +#include linux/tick.h

  #include workqueue_internal.h

 @@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t 
 gfp_mask)
 if (!alloc_cpumask_var(attrs-cpumask, gfp_mask))
 goto fail;

 +#ifdef CONFIG_NO_HZ_FULL
 +   cpumask_complement(attrs-cpumask, tick_nohz_full_mask);
 +#else
 cpumask_copy(attrs-cpumask, cpu_possible_mask);
 +#endif
 return attrs;
  fail:
 free_workqueue_attrs(attrs);

Can we play with this mask at runtime? I thought a better idea would be
to keep this mask as mask of all CPUs initially and once any CPU enters
NO_HZ_FULL mode we can remove that from mask? And ones it leaves
that mode we can get that added again..

I am looking to use similar concept in case of un-pinned timers with my
activity around cpuset.quiesce option..
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-27 Thread Mike Galbraith

On Thu, 2014-02-27 at 15:43 +0100, Frederic Weisbecker wrote: 
> On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote:
> > On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: 
> > > On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
> > > > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
> > > > > Acked-by: Lai Jiangshan 
> > > > 
> > > > Thank you all, queued for 3.15.
> > > > 
> > > > We should also have some facility for moving the SRCU workqueues to
> > > > housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
> > > > this patch already have that effect?
> > > 
> > > Kevin Hilman and me plan to try to bring a new Kconfig option that could 
> > > let
> > > us control the unbound workqueues affinity through sysfs.
> > 
> > Handing control to the user seemed like a fine thing, so I started
> > making a boot option to enable it.  Forcing WQ_SYSFS on at sysfs
> > decision spot doesn't go well, init order matters :)  Post init frobbing
> > required if you want to see/frob all unbound.
> 
> I'm curious about the details. Is that because some workqueues are registered
> before sysfs is even initialized?

Yeah.  I put in a test, and told it if not ready, go get ready and flag
yourself as having BTDT (one registration being plenty), but that only
made a different explosion.  Post-init rescan is definitely a better
plan, it can't be worse :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-27 Thread Frederic Weisbecker

On Fri, Feb 14, 2014 at 03:24:35PM -0800, Kevin Hilman wrote:
> Tejun Heo  writes:
> 
> > Hello,
> >
> > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> >> +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> >> +  to force the WQ_SYSFS workqueues to run on the specified set
> >> +  of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> >> +  "ls sys/devices/virtual/workqueue".
> >
> > One thing to be careful about is that once published, it becomes part
> > of userland visible interface.  Maybe adding some words warning
> > against sprinkling WQ_SYSFS willy-nilly is a good idea?
> 
> In the NO_HZ_FULL case, it seems to me we'd always want all unbound
> workqueues to have their affinity set to the housekeeping CPUs.
> 
> Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
> the affinity can be controlled?  I guess the main reason would be that 
> all of these workqueue names would become permanent ABI.

Right. It's a legitimate worry but couldn't we consider workqueue names
just like kernel threads names? Ie: something that can be renamed or
disappear anytime from a kernel version to another?

Or sysfs has real strict rules about that and I'm just daydreaming?

I've been thinking we could also have a pseudo-workqueue directory in
/sys/devices/virtual/workqueue/unbounds with only cpumask as a file.

Writing to it would set all unbound workqueue affinity, at least those
that don't have WQ_SYSFS.

This would solve the ABI issue and keep a single consistent interface
for workqueue affinity.

> 
> At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
> unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
> workqueues could still be overridden from userspace, but at least the
> default would be sane, and help keep full dyntics CPUs isolated.
> 
> Example patch below, only boot tested on 4-CPU ARM system with
> CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat
> /sys/devices/virtual/workqueue/writeback/cpumask' looked sane.  If this
> looks OK, I can maybe clean it up a bit and make it runtime check
> instead of a compile time check.

It can work too yeah. Maybe I prefer the idea of keeping a sysfs interface
for all workqueues (whether we use a pseudo "unbounds" dir or not) because then
the workqueue core stays unaware of dynticks details and it doesn't end up
fiddling with timers core internals like the full dynticks cpumask. The result
is more interdependency and possible headaches between timers and workqueue init
ordering.

And moreover people may forget to change WQ_SYSFS workqueues if all other 
UNBOUND
workqueues are known to be automatically handled. Or we handle WQ_SYSFS as well
along the way? But still WQ_SYSFS cpumask may be modified by other user 
programms
so it's still a round of set that must be done before doing any isolation work.

So I have mixed feelings between code complexity, simplicity for users, etc...

What do you guys think?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-27 Thread Frederic Weisbecker

On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote:
> On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: 
> > On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
> > > > Acked-by: Lai Jiangshan 
> > > 
> > > Thank you all, queued for 3.15.
> > > 
> > > We should also have some facility for moving the SRCU workqueues to
> > > housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
> > > this patch already have that effect?
> > 
> > Kevin Hilman and me plan to try to bring a new Kconfig option that could let
> > us control the unbound workqueues affinity through sysfs.
> 
> Handing control to the user seemed like a fine thing, so I started
> making a boot option to enable it.  Forcing WQ_SYSFS on at sysfs
> decision spot doesn't go well, init order matters :)  Post init frobbing
> required if you want to see/frob all unbound.

I'm curious about the details. Is that because some workqueues are registered
before sysfs is even initialized?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-27 Thread Frederic Weisbecker

On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote:
 On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: 
  On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
   On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
Acked-by: Lai Jiangshan la...@cn.fujitsu.com
   
   Thank you all, queued for 3.15.
   
   We should also have some facility for moving the SRCU workqueues to
   housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
   this patch already have that effect?
  
  Kevin Hilman and me plan to try to bring a new Kconfig option that could let
  us control the unbound workqueues affinity through sysfs.
 
 Handing control to the user seemed like a fine thing, so I started
 making a boot option to enable it.  Forcing WQ_SYSFS on at sysfs
 decision spot doesn't go well, init order matters :)  Post init frobbing
 required if you want to see/frob all unbound.

I'm curious about the details. Is that because some workqueues are registered
before sysfs is even initialized?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-27 Thread Frederic Weisbecker

On Fri, Feb 14, 2014 at 03:24:35PM -0800, Kevin Hilman wrote:
 Tejun Heo t...@kernel.org writes:
 
  Hello,
 
  On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
  +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
  +  to force the WQ_SYSFS workqueues to run on the specified set
  +  of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
  +  ls sys/devices/virtual/workqueue.
 
  One thing to be careful about is that once published, it becomes part
  of userland visible interface.  Maybe adding some words warning
  against sprinkling WQ_SYSFS willy-nilly is a good idea?
 
 In the NO_HZ_FULL case, it seems to me we'd always want all unbound
 workqueues to have their affinity set to the housekeeping CPUs.
 
 Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
 the affinity can be controlled?  I guess the main reason would be that 
 all of these workqueue names would become permanent ABI.

Right. It's a legitimate worry but couldn't we consider workqueue names
just like kernel threads names? Ie: something that can be renamed or
disappear anytime from a kernel version to another?

Or sysfs has real strict rules about that and I'm just daydreaming?

I've been thinking we could also have a pseudo-workqueue directory in
/sys/devices/virtual/workqueue/unbounds with only cpumask as a file.

Writing to it would set all unbound workqueue affinity, at least those
that don't have WQ_SYSFS.

This would solve the ABI issue and keep a single consistent interface
for workqueue affinity.

 
 At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
 unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
 workqueues could still be overridden from userspace, but at least the
 default would be sane, and help keep full dyntics CPUs isolated.
 
 Example patch below, only boot tested on 4-CPU ARM system with
 CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat
 /sys/devices/virtual/workqueue/writeback/cpumask' looked sane.  If this
 looks OK, I can maybe clean it up a bit and make it runtime check
 instead of a compile time check.

It can work too yeah. Maybe I prefer the idea of keeping a sysfs interface
for all workqueues (whether we use a pseudo unbounds dir or not) because then
the workqueue core stays unaware of dynticks details and it doesn't end up
fiddling with timers core internals like the full dynticks cpumask. The result
is more interdependency and possible headaches between timers and workqueue init
ordering.

And moreover people may forget to change WQ_SYSFS workqueues if all other 
UNBOUND
workqueues are known to be automatically handled. Or we handle WQ_SYSFS as well
along the way? But still WQ_SYSFS cpumask may be modified by other user 
programms
so it's still a round of set that must be done before doing any isolation work.

So I have mixed feelings between code complexity, simplicity for users, etc...

What do you guys think?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-27 Thread Mike Galbraith

On Thu, 2014-02-27 at 15:43 +0100, Frederic Weisbecker wrote: 
 On Mon, Feb 17, 2014 at 06:26:41AM +0100, Mike Galbraith wrote:
  On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: 
   On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
 Acked-by: Lai Jiangshan la...@cn.fujitsu.com

Thank you all, queued for 3.15.

We should also have some facility for moving the SRCU workqueues to
housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
this patch already have that effect?
   
   Kevin Hilman and me plan to try to bring a new Kconfig option that could 
   let
   us control the unbound workqueues affinity through sysfs.
  
  Handing control to the user seemed like a fine thing, so I started
  making a boot option to enable it.  Forcing WQ_SYSFS on at sysfs
  decision spot doesn't go well, init order matters :)  Post init frobbing
  required if you want to see/frob all unbound.
 
 I'm curious about the details. Is that because some workqueues are registered
 before sysfs is even initialized?

Yeah.  I put in a test, and told it if not ready, go get ready and flag
yourself as having BTDT (one registration being plenty), but that only
made a different explosion.  Post-init rescan is definitely a better
plan, it can't be worse :)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-24 Thread Mike Galbraith

On Mon, 2014-02-24 at 09:55 -0800, Kevin Hilman wrote:

> Yeah, my patch only addresses the nohz_full case, but since there
> doesn't seem to be any general agreemenet about the generic case, it
> seems that exposing all unbound workqueues via WQ_SYSFS is the way to
> go.  
> 
> Mike, looks like you may have started on that.  Did it get any further?

No, work gets in the way of tinker time.  Not sure which way I want to
explore anyway.  On the one hand, auto-quiesce thingy tied to domain
construction/destruction sounds good, but on the other, just handing
control to the user is highly attractive.. likely lots simpler too.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-24 Thread Kevin Hilman

Mike Galbraith  writes:

> On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote:
>
>> So if there is NO_HZ_FULL, you have no objection to binding workqueues to
>> the timekeeping CPUs, but that you would also like some form of automatic
>> binding in the !NO_HZ_FULL case.  Of course, if a common mechanism could
>> serve both cases, that would be good.  And yes, cpusets are frowned upon
>> for some workloads.
>
> I'm not _objecting_, I'm not driving, Frederic's doing that ;-)
>
> That said, isolation seems to be turning into a property of nohz mode,
> but as I see it, nohz_full is an extension to generic isolation.
>
>> So maybe start with Kevin's patch, but augment with something else for
>> the !NO_HZ_FULL case?
>
> Sure (hm, does it work without workqueue.disable_numa ?).

[ /me returns from vacation ]

Yes, since it happens for every alloc_workqueue_attrs()

> It just seems to me that tying it to sched domain construction would be
> a better fit.  That way, it doesn't matter what your isolation requiring
> load is, whether you run a gaggle of realtime tasks or one HPC task your
> business, the generic requirement is isolation, not tick mode.  For one
> HPC task per core, you want no tick, if you're running all SCHED_FIFO,
> maybe you want that too, depends on the impact of nohz_full mode.  All
> sensitive loads want the isolation, but they may not like the price.
>
> I personally like the cpuset way.  Being able to partition boxen on the
> fly makes them very flexible.  In a perfect world, you'd be able to
> quiesce and configure offloading and nohz_full on the fly too, and not
> end up with some hodgepodge like this needs boot option foo, that
> happens invisibly because of config option bar, the other thing you have
> to do manually.. and you get to eat 937 kthreads and tons of overhead on
> all CPUs if you want the ability to _maybe_ run a critical task or two.

Yeah, my patch only addresses the nohz_full case, but since there
doesn't seem to be any general agreemenet about the generic case, it
seems that exposing all unbound workqueues via WQ_SYSFS is the way to
go.  

Mike, looks like you may have started on that.  Did it get any further?

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-24 Thread Kevin Hilman

Mike Galbraith bitbuc...@online.de writes:

 On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote:

 So if there is NO_HZ_FULL, you have no objection to binding workqueues to
 the timekeeping CPUs, but that you would also like some form of automatic
 binding in the !NO_HZ_FULL case.  Of course, if a common mechanism could
 serve both cases, that would be good.  And yes, cpusets are frowned upon
 for some workloads.

 I'm not _objecting_, I'm not driving, Frederic's doing that ;-)

 That said, isolation seems to be turning into a property of nohz mode,
 but as I see it, nohz_full is an extension to generic isolation.

 So maybe start with Kevin's patch, but augment with something else for
 the !NO_HZ_FULL case?

 Sure (hm, does it work without workqueue.disable_numa ?).

[ /me returns from vacation ]

Yes, since it happens for every alloc_workqueue_attrs()

 It just seems to me that tying it to sched domain construction would be
 a better fit.  That way, it doesn't matter what your isolation requiring
 load is, whether you run a gaggle of realtime tasks or one HPC task your
 business, the generic requirement is isolation, not tick mode.  For one
 HPC task per core, you want no tick, if you're running all SCHED_FIFO,
 maybe you want that too, depends on the impact of nohz_full mode.  All
 sensitive loads want the isolation, but they may not like the price.

 I personally like the cpuset way.  Being able to partition boxen on the
 fly makes them very flexible.  In a perfect world, you'd be able to
 quiesce and configure offloading and nohz_full on the fly too, and not
 end up with some hodgepodge like this needs boot option foo, that
 happens invisibly because of config option bar, the other thing you have
 to do manually.. and you get to eat 937 kthreads and tons of overhead on
 all CPUs if you want the ability to _maybe_ run a critical task or two.

Yeah, my patch only addresses the nohz_full case, but since there
doesn't seem to be any general agreemenet about the generic case, it
seems that exposing all unbound workqueues via WQ_SYSFS is the way to
go.  

Mike, looks like you may have started on that.  Did it get any further?

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-24 Thread Mike Galbraith

On Mon, 2014-02-24 at 09:55 -0800, Kevin Hilman wrote:

 Yeah, my patch only addresses the nohz_full case, but since there
 doesn't seem to be any general agreemenet about the generic case, it
 seems that exposing all unbound workqueues via WQ_SYSFS is the way to
 go.  
 
 Mike, looks like you may have started on that.  Did it get any further?

No, work gets in the way of tinker time.  Not sure which way I want to
explore anyway.  On the one hand, auto-quiesce thingy tied to domain
construction/destruction sounds good, but on the other, just handing
control to the user is highly attractive.. likely lots simpler too.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-18 Thread Mike Galbraith

On Mon, 2014-02-17 at 05:50 +0100, Mike Galbraith wrote: 
> On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote:

> > So maybe start with Kevin's patch, but augment with something else for
> > the !NO_HZ_FULL case?
> 
> Sure (hm, does it work without workqueue.disable_numa ?).

I took patch out for a spin on a 40 core box +SMT, with CPUs 4-79
isolated via exclusive cpuset with load balancing off.  Worker bees
ignored patch either way.

-Mike

Perturbation measurement hog pinned to CPU4.

With patch:

#   TASK-PID   CPU#  ||  TIMESTAMP  FUNCTION
#  | |   |   || | |
pert-9949  [004] 113   405.120164: workqueue_queue_work: work 
struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=4
pert-9949  [004] 113   405.120166: workqueue_activate_work: 
work struct 880a5c4ecc08
pert-9949  [004] d.L.313   405.120169: sched_wakeup: 
comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004
pert-9949  [004] d.Lh213   405.120170: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d.L.213   405.120172: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d...3..   405.120173: sched_switch: prev_comm=pert 
prev_pid=9949 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:2 
next_pid=2119 next_prio=120
 kworker/4:2-2119  [004] 1..   405.120174: workqueue_execute_start: 
work struct 880a5c4ecc08: function flush_to_ldisc
 kworker/4:2-2119  [004] d...311   405.120176: sched_wakeup: comm=sshd 
pid=6620 prio=120 success=1 target_cpu=000
 kworker/4:2-2119  [004] 1..   405.120176: workqueue_execute_end: work 
struct 880a5c4ecc08
 kworker/4:2-2119  [004] d...3..   405.120177: sched_switch: 
prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S ==> 
next_comm=pert next_pid=9949 next_prio=120
pert-9949  [004] 113   405.120178: workqueue_queue_work: work 
struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=4
pert-9949  [004] 113   405.120179: workqueue_activate_work: 
work struct 880a5c4ecc08
pert-9949  [004] d.L.313   405.120179: sched_wakeup: 
comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004
pert-9949  [004] d.L.213   405.120181: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d...3..   405.120181: sched_switch: prev_comm=pert 
prev_pid=9949 prev_prio=120 prev_state=R+ ==> next_comm=kworker/4:2 
next_pid=2119 next_prio=120
 kworker/4:2-2119  [004] 1..   405.120182: workqueue_execute_start: 
work struct 880a5c4ecc08: function flush_to_ldisc
 kworker/4:2-2119  [004] 1..   405.120183: workqueue_execute_end: work 
struct 880a5c4ecc08
 kworker/4:2-2119  [004] d...3..   405.120183: sched_switch: 
prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S ==> 
next_comm=pert next_pid=9949 next_prio=120
pert-9949  [004] d...1..   405.120736: tick_stop: success=yes msg= 
pert-9949  [004] 113   410.121082: workqueue_queue_work: work 
struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=4
pert-9949  [004] 113   410.121082: workqueue_activate_work: 
work struct 880a5c4ecc08
pert-9949  [004] d.L.313   410.121084: sched_wakeup: 
comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004
pert-9949  [004] d.Lh213   410.121085: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d.L.213   410.121087: tick_stop: success=no 
msg=more than 1 task in runqueue
...and so on until tick time


(extra cheezy) hack kinda sorta works iff workqueue.disable_numa:

---
 kernel/workqueue.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1328,8 +1328,11 @@ static void __queue_work(int cpu, struct
 
rcu_read_lock();
 retry:
-   if (req_cpu == WORK_CPU_UNBOUND)
+   if (req_cpu == WORK_CPU_UNBOUND) {
cpu = raw_smp_processor_id();
+   if (runqueue_is_isolated(cpu))
+   cpu = 0;
+   }
 
/* pwq which will be used unless @work is executing elsewhere */
if (!(wq->flags & WQ_UNBOUND))


#   TASK-PID   CPU#  ||  TIMESTAMP  FUNCTION
#  | |   |   || | |
   <...>-33824 [004] 113  .889694: workqueue_queue_work: work 
struct=880a596eb008 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=0
   <...>-33824 [004] 113  .889695: workqueue_activate_work: 
work struct 880a596eb008
   <...>-33824 [004] d...313  .889697: sched_wakeup: 
comm=kworker/0:2 pid=2105 prio=120 success=1 target_cpu=000
   <...>-33824 [004] 113  5560.890594:

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-18 Thread Mike Galbraith

On Mon, 2014-02-17 at 05:50 +0100, Mike Galbraith wrote: 
 On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote:

  So maybe start with Kevin's patch, but augment with something else for
  the !NO_HZ_FULL case?
 
 Sure (hm, does it work without workqueue.disable_numa ?).

I took patch out for a spin on a 40 core box +SMT, with CPUs 4-79
isolated via exclusive cpuset with load balancing off.  Worker bees
ignored patch either way.

-Mike

Perturbation measurement hog pinned to CPU4.

With patch:

#   TASK-PID   CPU#  ||  TIMESTAMP  FUNCTION
#  | |   |   || | |
pert-9949  [004] 113   405.120164: workqueue_queue_work: work 
struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=4
pert-9949  [004] 113   405.120166: workqueue_activate_work: 
work struct 880a5c4ecc08
pert-9949  [004] d.L.313   405.120169: sched_wakeup: 
comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004
pert-9949  [004] d.Lh213   405.120170: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d.L.213   405.120172: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d...3..   405.120173: sched_switch: prev_comm=pert 
prev_pid=9949 prev_prio=120 prev_state=R+ == next_comm=kworker/4:2 
next_pid=2119 next_prio=120
 kworker/4:2-2119  [004] 1..   405.120174: workqueue_execute_start: 
work struct 880a5c4ecc08: function flush_to_ldisc
 kworker/4:2-2119  [004] d...311   405.120176: sched_wakeup: comm=sshd 
pid=6620 prio=120 success=1 target_cpu=000
 kworker/4:2-2119  [004] 1..   405.120176: workqueue_execute_end: work 
struct 880a5c4ecc08
 kworker/4:2-2119  [004] d...3..   405.120177: sched_switch: 
prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S == 
next_comm=pert next_pid=9949 next_prio=120
pert-9949  [004] 113   405.120178: workqueue_queue_work: work 
struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=4
pert-9949  [004] 113   405.120179: workqueue_activate_work: 
work struct 880a5c4ecc08
pert-9949  [004] d.L.313   405.120179: sched_wakeup: 
comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004
pert-9949  [004] d.L.213   405.120181: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d...3..   405.120181: sched_switch: prev_comm=pert 
prev_pid=9949 prev_prio=120 prev_state=R+ == next_comm=kworker/4:2 
next_pid=2119 next_prio=120
 kworker/4:2-2119  [004] 1..   405.120182: workqueue_execute_start: 
work struct 880a5c4ecc08: function flush_to_ldisc
 kworker/4:2-2119  [004] 1..   405.120183: workqueue_execute_end: work 
struct 880a5c4ecc08
 kworker/4:2-2119  [004] d...3..   405.120183: sched_switch: 
prev_comm=kworker/4:2 prev_pid=2119 prev_prio=120 prev_state=S == 
next_comm=pert next_pid=9949 next_prio=120
pert-9949  [004] d...1..   405.120736: tick_stop: success=yes msg= 
pert-9949  [004] 113   410.121082: workqueue_queue_work: work 
struct=880a5c4ecc08 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=4
pert-9949  [004] 113   410.121082: workqueue_activate_work: 
work struct 880a5c4ecc08
pert-9949  [004] d.L.313   410.121084: sched_wakeup: 
comm=kworker/4:2 pid=2119 prio=120 success=1 target_cpu=004
pert-9949  [004] d.Lh213   410.121085: tick_stop: success=no 
msg=more than 1 task in runqueue
pert-9949  [004] d.L.213   410.121087: tick_stop: success=no 
msg=more than 1 task in runqueue
...and so on until tick time


(extra cheezy) hack kinda sorta works iff workqueue.disable_numa:

---
 kernel/workqueue.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1328,8 +1328,11 @@ static void __queue_work(int cpu, struct
 
rcu_read_lock();
 retry:
-   if (req_cpu == WORK_CPU_UNBOUND)
+   if (req_cpu == WORK_CPU_UNBOUND) {
cpu = raw_smp_processor_id();
+   if (runqueue_is_isolated(cpu))
+   cpu = 0;
+   }
 
/* pwq which will be used unless @work is executing elsewhere */
if (!(wq-flags  WQ_UNBOUND))


#   TASK-PID   CPU#  ||  TIMESTAMP  FUNCTION
#  | |   |   || | |
   ...-33824 [004] 113  .889694: workqueue_queue_work: work 
struct=880a596eb008 function=flush_to_ldisc workqueue=88046f40ba00 
req_cpu=256 cpu=0
   ...-33824 [004] 113  .889695: workqueue_activate_work: 
work struct 880a596eb008
   ...-33824 [004] d...313  .889697: sched_wakeup: 
comm=kworker/0:2 pid=2105 prio=120 success=1 target_cpu=000
   ...-33824 [004] 113  5560.890594: workqueue_queue_work:

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-16 Thread Mike Galbraith

On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: 
> On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
> > > Acked-by: Lai Jiangshan 
> > 
> > Thank you all, queued for 3.15.
> > 
> > We should also have some facility for moving the SRCU workqueues to
> > housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
> > this patch already have that effect?
> 
> Kevin Hilman and me plan to try to bring a new Kconfig option that could let
> us control the unbound workqueues affinity through sysfs.

Handing control to the user seemed like a fine thing, so I started
making a boot option to enable it.  Forcing WQ_SYSFS on at sysfs
decision spot doesn't go well, init order matters :)  Post init frobbing
required if you want to see/frob all unbound.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-16 Thread Mike Galbraith

On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote:

> So if there is NO_HZ_FULL, you have no objection to binding workqueues to
> the timekeeping CPUs, but that you would also like some form of automatic
> binding in the !NO_HZ_FULL case.  Of course, if a common mechanism could
> serve both cases, that would be good.  And yes, cpusets are frowned upon
> for some workloads.

I'm not _objecting_, I'm not driving, Frederic's doing that ;-)

That said, isolation seems to be turning into a property of nohz mode,
but as I see it, nohz_full is an extension to generic isolation.

> So maybe start with Kevin's patch, but augment with something else for
> the !NO_HZ_FULL case?

Sure (hm, does it work without workqueue.disable_numa ?).

It just seems to me that tying it to sched domain construction would be
a better fit.  That way, it doesn't matter what your isolation requiring
load is, whether you run a gaggle of realtime tasks or one HPC task your
business, the generic requirement is isolation, not tick mode.  For one
HPC task per core, you want no tick, if you're running all SCHED_FIFO,
maybe you want that too, depends on the impact of nohz_full mode.  All
sensitive loads want the isolation, but they may not like the price.

I personally like the cpuset way.  Being able to partition boxen on the
fly makes them very flexible.  In a perfect world, you'd be able to
quiesce and configure offloading and nohz_full on the fly too, and not
end up with some hodgepodge like this needs boot option foo, that
happens invisibly because of config option bar, the other thing you have
to do manually.. and you get to eat 937 kthreads and tons of overhead on
all CPUs if you want the ability to _maybe_ run a critical task or two.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-16 Thread Paul E. McKenney

On Sat, Feb 15, 2014 at 08:36:44AM +0100, Mike Galbraith wrote:
> On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: 
> > Tejun Heo  writes:
> > 
> > > Hello,
> > >
> > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> > >> +2.  Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> > >> +to force the WQ_SYSFS workqueues to run on the specified set
> > >> +of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> > >> +"ls sys/devices/virtual/workqueue".
> > >
> > > One thing to be careful about is that once published, it becomes part
> > > of userland visible interface.  Maybe adding some words warning
> > > against sprinkling WQ_SYSFS willy-nilly is a good idea?
> > 
> > In the NO_HZ_FULL case, it seems to me we'd always want all unbound
> > workqueues to have their affinity set to the housekeeping CPUs.
> > 
> > Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
> > the affinity can be controlled?  I guess the main reason would be that 
> > all of these workqueue names would become permanent ABI.
> > 
> > At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
> > unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
> > workqueues could still be overridden from userspace, but at least the
> > default would be sane, and help keep full dyntics CPUs isolated.
> 
> What I'm thinking is that it should be automatic, but not necessarily
> based upon the nohz full mask, rather maybe based upon whether sched
> domains exist, or perhaps a generic exclusive cpuset property, though
> some really don't want anything to do with cpusets.
> 
> Why? Because there are jitter intolerant loads where nohz full isn't all
> that useful, because you'll be constantly restarting and stopping the
> tick, and eating the increased accounting overhead to no gain because
> there are frequently multiple realtime tasks running.  For these loads
> (I have a user with such a fairly hefty 80 core rt load), dynamically
> turning the tick _on_ is currently a better choice than nohz_full.
> Point being, control of where unbound workqueues are allowed to run
> isn't only desirable for single task HPC loads, other loads exist.
> 
> For my particular fairly critical 80 core load, workqueues aren't a real
> big hairy deal, because its jitter tolerance isn't _all_ that tight (30
> us max is easy enough to meet with room to spare).  The load can slice
> through workers well enough to meet requirements, but it would certainly
> be a win to be able to keep them at bay.  (gonna measure it, less jitter
> is better even if it's only a little bit better.. eventually somebody
> will demand what's currently impossible to deliver)

So if there is NO_HZ_FULL, you have no objection to binding workqueues to
the timekeeping CPUs, but that you would also like some form of automatic
binding in the !NO_HZ_FULL case.  Of course, if a common mechanism could
serve both cases, that would be good.  And yes, cpusets are frowned upon
for some workloads.

So maybe start with Kevin's patch, but augment with something else for
the !NO_HZ_FULL case?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-16 Thread Paul E. McKenney

On Sat, Feb 15, 2014 at 08:36:44AM +0100, Mike Galbraith wrote:
 On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: 
  Tejun Heo t...@kernel.org writes:
  
   Hello,
  
   On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
   +2.  Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
   +to force the WQ_SYSFS workqueues to run on the specified set
   +of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
   +ls sys/devices/virtual/workqueue.
  
   One thing to be careful about is that once published, it becomes part
   of userland visible interface.  Maybe adding some words warning
   against sprinkling WQ_SYSFS willy-nilly is a good idea?
  
  In the NO_HZ_FULL case, it seems to me we'd always want all unbound
  workqueues to have their affinity set to the housekeeping CPUs.
  
  Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
  the affinity can be controlled?  I guess the main reason would be that 
  all of these workqueue names would become permanent ABI.
  
  At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
  unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
  workqueues could still be overridden from userspace, but at least the
  default would be sane, and help keep full dyntics CPUs isolated.
 
 What I'm thinking is that it should be automatic, but not necessarily
 based upon the nohz full mask, rather maybe based upon whether sched
 domains exist, or perhaps a generic exclusive cpuset property, though
 some really don't want anything to do with cpusets.
 
 Why? Because there are jitter intolerant loads where nohz full isn't all
 that useful, because you'll be constantly restarting and stopping the
 tick, and eating the increased accounting overhead to no gain because
 there are frequently multiple realtime tasks running.  For these loads
 (I have a user with such a fairly hefty 80 core rt load), dynamically
 turning the tick _on_ is currently a better choice than nohz_full.
 Point being, control of where unbound workqueues are allowed to run
 isn't only desirable for single task HPC loads, other loads exist.
 
 For my particular fairly critical 80 core load, workqueues aren't a real
 big hairy deal, because its jitter tolerance isn't _all_ that tight (30
 us max is easy enough to meet with room to spare).  The load can slice
 through workers well enough to meet requirements, but it would certainly
 be a win to be able to keep them at bay.  (gonna measure it, less jitter
 is better even if it's only a little bit better.. eventually somebody
 will demand what's currently impossible to deliver)

So if there is NO_HZ_FULL, you have no objection to binding workqueues to
the timekeeping CPUs, but that you would also like some form of automatic
binding in the !NO_HZ_FULL case.  Of course, if a common mechanism could
serve both cases, that would be good.  And yes, cpusets are frowned upon
for some workloads.

So maybe start with Kevin's patch, but augment with something else for
the !NO_HZ_FULL case?

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-16 Thread Mike Galbraith

On Sun, 2014-02-16 at 08:41 -0800, Paul E. McKenney wrote:

 So if there is NO_HZ_FULL, you have no objection to binding workqueues to
 the timekeeping CPUs, but that you would also like some form of automatic
 binding in the !NO_HZ_FULL case.  Of course, if a common mechanism could
 serve both cases, that would be good.  And yes, cpusets are frowned upon
 for some workloads.

I'm not _objecting_, I'm not driving, Frederic's doing that ;-)

That said, isolation seems to be turning into a property of nohz mode,
but as I see it, nohz_full is an extension to generic isolation.

 So maybe start with Kevin's patch, but augment with something else for
 the !NO_HZ_FULL case?

Sure (hm, does it work without workqueue.disable_numa ?).

It just seems to me that tying it to sched domain construction would be
a better fit.  That way, it doesn't matter what your isolation requiring
load is, whether you run a gaggle of realtime tasks or one HPC task your
business, the generic requirement is isolation, not tick mode.  For one
HPC task per core, you want no tick, if you're running all SCHED_FIFO,
maybe you want that too, depends on the impact of nohz_full mode.  All
sensitive loads want the isolation, but they may not like the price.

I personally like the cpuset way.  Being able to partition boxen on the
fly makes them very flexible.  In a perfect world, you'd be able to
quiesce and configure offloading and nohz_full on the fly too, and not
end up with some hodgepodge like this needs boot option foo, that
happens invisibly because of config option bar, the other thing you have
to do manually.. and you get to eat 937 kthreads and tons of overhead on
all CPUs if you want the ability to _maybe_ run a critical task or two.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-16 Thread Mike Galbraith

On Wed, 2014-02-12 at 19:23 +0100, Frederic Weisbecker wrote: 
 On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
  On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
   Acked-by: Lai Jiangshan la...@cn.fujitsu.com
  
  Thank you all, queued for 3.15.
  
  We should also have some facility for moving the SRCU workqueues to
  housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
  this patch already have that effect?
 
 Kevin Hilman and me plan to try to bring a new Kconfig option that could let
 us control the unbound workqueues affinity through sysfs.

Handing control to the user seemed like a fine thing, so I started
making a boot option to enable it.  Forcing WQ_SYSFS on at sysfs
decision spot doesn't go well, init order matters :)  Post init frobbing
required if you want to see/frob all unbound.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-14 Thread Mike Galbraith

On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: 
> Tejun Heo  writes:
> 
> > Hello,
> >
> > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> >> +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> >> +  to force the WQ_SYSFS workqueues to run on the specified set
> >> +  of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> >> +  "ls sys/devices/virtual/workqueue".
> >
> > One thing to be careful about is that once published, it becomes part
> > of userland visible interface.  Maybe adding some words warning
> > against sprinkling WQ_SYSFS willy-nilly is a good idea?
> 
> In the NO_HZ_FULL case, it seems to me we'd always want all unbound
> workqueues to have their affinity set to the housekeeping CPUs.
> 
> Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
> the affinity can be controlled?  I guess the main reason would be that 
> all of these workqueue names would become permanent ABI.
> 
> At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
> unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
> workqueues could still be overridden from userspace, but at least the
> default would be sane, and help keep full dyntics CPUs isolated.

What I'm thinking is that it should be automatic, but not necessarily
based upon the nohz full mask, rather maybe based upon whether sched
domains exist, or perhaps a generic exclusive cpuset property, though
some really don't want anything to do with cpusets.

Why? Because there are jitter intolerant loads where nohz full isn't all
that useful, because you'll be constantly restarting and stopping the
tick, and eating the increased accounting overhead to no gain because
there are frequently multiple realtime tasks running.  For these loads
(I have a user with such a fairly hefty 80 core rt load), dynamically
turning the tick _on_ is currently a better choice than nohz_full.
Point being, control of where unbound workqueues are allowed to run
isn't only desirable for single task HPC loads, other loads exist.

For my particular fairly critical 80 core load, workqueues aren't a real
big hairy deal, because its jitter tolerance isn't _all_ that tight (30
us max is easy enough to meet with room to spare).  The load can slice
through workers well enough to meet requirements, but it would certainly
be a win to be able to keep them at bay.  (gonna measure it, less jitter
is better even if it's only a little bit better.. eventually somebody
will demand what's currently impossible to deliver)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-14 Thread Kevin Hilman

Tejun Heo  writes:

> Hello,
>
> On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
>> +2.  Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
>> +to force the WQ_SYSFS workqueues to run on the specified set
>> +of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
>> +"ls sys/devices/virtual/workqueue".
>
> One thing to be careful about is that once published, it becomes part
> of userland visible interface.  Maybe adding some words warning
> against sprinkling WQ_SYSFS willy-nilly is a good idea?

In the NO_HZ_FULL case, it seems to me we'd always want all unbound
workqueues to have their affinity set to the housekeeping CPUs.

Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
the affinity can be controlled?  I guess the main reason would be that 
all of these workqueue names would become permanent ABI.

At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
workqueues could still be overridden from userspace, but at least the
default would be sane, and help keep full dyntics CPUs isolated.

Example patch below, only boot tested on 4-CPU ARM system with
CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat
/sys/devices/virtual/workqueue/writeback/cpumask' looked sane.  If this
looks OK, I can maybe clean it up a bit and make it runtime check
instead of a compile time check.

Kevin

>From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001
From: Kevin Hilman 
Date: Fri, 14 Feb 2014 15:10:58 -0800
Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to
 !tick_nohz_full_mask

To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues
from running on full dynticks CPUs.  To do this, set the default
workqueue cpumask to be the set of "housekeeping" CPUs instead of all
possible CPUs.

This is just just the starting/default cpumask, and can be overridden
with all the normal ways (NUMA settings, apply_workqueue_attrs and via
sysfs for workqueus with the WQ_SYSFS attribute.)

Cc: Tejun Heo 
Cc: Paul McKenney 
Cc: Frederic Weisbecker 
Signed-off-by: Kevin Hilman 
---
 kernel/workqueue.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 987293d03ebc..9a9d9b0eaf6d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "workqueue_internal.h"

@@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t 
gfp_mask)
if (!alloc_cpumask_var(>cpumask, gfp_mask))
goto fail;

+#ifdef CONFIG_NO_HZ_FULL
+   cpumask_complement(attrs->cpumask, tick_nohz_full_mask);
+#else
cpumask_copy(attrs->cpumask, cpu_possible_mask);
+#endif
return attrs;
 fail:
free_workqueue_attrs(attrs);
-- 
1.8.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-14 Thread Kevin Hilman

Tejun Heo t...@kernel.org writes:

 Hello,

 On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
 +2.  Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
 +to force the WQ_SYSFS workqueues to run on the specified set
 +of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
 +ls sys/devices/virtual/workqueue.

 One thing to be careful about is that once published, it becomes part
 of userland visible interface.  Maybe adding some words warning
 against sprinkling WQ_SYSFS willy-nilly is a good idea?

In the NO_HZ_FULL case, it seems to me we'd always want all unbound
workqueues to have their affinity set to the housekeeping CPUs.

Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
the affinity can be controlled?  I guess the main reason would be that 
all of these workqueue names would become permanent ABI.

At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
workqueues could still be overridden from userspace, but at least the
default would be sane, and help keep full dyntics CPUs isolated.

Example patch below, only boot tested on 4-CPU ARM system with
CONFIG_NO_HZ_FULL_ALL=y and verified that 'cat
/sys/devices/virtual/workqueue/writeback/cpumask' looked sane.  If this
looks OK, I can maybe clean it up a bit and make it runtime check
instead of a compile time check.

Kevin



From 902a2b58d61a51415457ea6768d687cdb7532eff Mon Sep 17 00:00:00 2001
From: Kevin Hilman khil...@linaro.org
Date: Fri, 14 Feb 2014 15:10:58 -0800
Subject: [PATCH] workqueue: for NO_HZ_FULL, set default cpumask to
 !tick_nohz_full_mask

To help in keeping NO_HZ_FULL CPUs isolated, keep unbound workqueues
from running on full dynticks CPUs.  To do this, set the default
workqueue cpumask to be the set of housekeeping CPUs instead of all
possible CPUs.

This is just just the starting/default cpumask, and can be overridden
with all the normal ways (NUMA settings, apply_workqueue_attrs and via
sysfs for workqueus with the WQ_SYSFS attribute.)

Cc: Tejun Heo t...@kernel.org
Cc: Paul McKenney paul...@linux.vnet.ibm.com
Cc: Frederic Weisbecker fweis...@gmail.com
Signed-off-by: Kevin Hilman khil...@linaro.org
---
 kernel/workqueue.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 987293d03ebc..9a9d9b0eaf6d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -48,6 +48,7 @@
 #include linux/nodemask.h
 #include linux/moduleparam.h
 #include linux/uaccess.h
+#include linux/tick.h
 
 #include workqueue_internal.h
 
@@ -3436,7 +3437,11 @@ struct workqueue_attrs *alloc_workqueue_attrs(gfp_t 
gfp_mask)
if (!alloc_cpumask_var(attrs-cpumask, gfp_mask))
goto fail;
 
+#ifdef CONFIG_NO_HZ_FULL
+   cpumask_complement(attrs-cpumask, tick_nohz_full_mask);
+#else
cpumask_copy(attrs-cpumask, cpu_possible_mask);
+#endif
return attrs;
 fail:
free_workqueue_attrs(attrs);
-- 
1.8.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-14 Thread Mike Galbraith

On Fri, 2014-02-14 at 15:24 -0800, Kevin Hilman wrote: 
 Tejun Heo t...@kernel.org writes:
 
  Hello,
 
  On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
  +2.Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
  +  to force the WQ_SYSFS workqueues to run on the specified set
  +  of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
  +  ls sys/devices/virtual/workqueue.
 
  One thing to be careful about is that once published, it becomes part
  of userland visible interface.  Maybe adding some words warning
  against sprinkling WQ_SYSFS willy-nilly is a good idea?
 
 In the NO_HZ_FULL case, it seems to me we'd always want all unbound
 workqueues to have their affinity set to the housekeeping CPUs.
 
 Is there any reason not to enable WQ_SYSFS whenever WQ_UNBOUND is set so
 the affinity can be controlled?  I guess the main reason would be that 
 all of these workqueue names would become permanent ABI.
 
 At least for NO_HZ_FULL, maybe this should be automatic.  The cpumask of
 unbound workqueues should default to !tick_nohz_full_mask?  Any WQ_SYSFS
 workqueues could still be overridden from userspace, but at least the
 default would be sane, and help keep full dyntics CPUs isolated.

What I'm thinking is that it should be automatic, but not necessarily
based upon the nohz full mask, rather maybe based upon whether sched
domains exist, or perhaps a generic exclusive cpuset property, though
some really don't want anything to do with cpusets.

Why? Because there are jitter intolerant loads where nohz full isn't all
that useful, because you'll be constantly restarting and stopping the
tick, and eating the increased accounting overhead to no gain because
there are frequently multiple realtime tasks running.  For these loads
(I have a user with such a fairly hefty 80 core rt load), dynamically
turning the tick _on_ is currently a better choice than nohz_full.
Point being, control of where unbound workqueues are allowed to run
isn't only desirable for single task HPC loads, other loads exist.

For my particular fairly critical 80 core load, workqueues aren't a real
big hairy deal, because its jitter tolerance isn't _all_ that tight (30
us max is easy enough to meet with room to spare).  The load can slice
through workers well enough to meet requirements, but it would certainly
be a win to be able to keep them at bay.  (gonna measure it, less jitter
is better even if it's only a little bit better.. eventually somebody
will demand what's currently impossible to deliver)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-13 Thread Frederic Weisbecker

On Wed, Feb 12, 2014 at 04:33:11PM -0800, Paul E. McKenney wrote:
> 
> Fair point!  I wordsmithed it into the following.  Seem reasonable?
> 
>   Thanx, Paul
> 
> 
> 
> Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
> 
> This commit documents the ability to apply CPU affinity to WQ_SYSFS
> workqueues, thus offloading them from the desired worker CPUs.
> 
> Signed-off-by: Paul E. McKenney 
> Reviewed-by: Tejun Heo 
> Acked-by: Frederic Weisbecker 
> 
> diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
> b/Documentation/kernel-per-CPU-kthreads.txt
> index 827104fb9364..f3cd299fcc41 100644
> --- a/Documentation/kernel-per-CPU-kthreads.txt
> +++ b/Documentation/kernel-per-CPU-kthreads.txt
> @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
>  To reduce its OS jitter, do any of the following:
>  1.   Run your workload at a real-time priority, which will allow
>   preempting the kworker daemons.
> -2.   Do any of the following needed to avoid jitter that your
> +2.   A given workqueue can be made visible in the sysfs filesystem
> + by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
> + Such a workqueue can be confined to a given subset of the
> + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
> + files.  The set of WQ_SYSFS workqueues can be displayed using
> + "ls sys/devices/virtual/workqueue".  That said, the workqueues
> + maintainer would like to caution people against indiscriminately
> + sprinkling WQ_SYSFS across all the workqueues.  The reason for
> + caution is that it is easy to add WQ_SYSFS, but because sysfs is
> + part of the formal user/kernel API, it can be nearly impossible
> + to remove it, even if its addition was a mistake.
> +3.   Do any of the following needed to avoid jitter that your
>   application cannot tolerate:
>   a.  Build your kernel with CONFIG_SLUB=y rather than
>   CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
> 

Perfect!!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-13 Thread Frederic Weisbecker

On Wed, Feb 12, 2014 at 04:33:11PM -0800, Paul E. McKenney wrote:
 
 Fair point!  I wordsmithed it into the following.  Seem reasonable?
 
   Thanx, Paul
 
 
 
 Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
 
 This commit documents the ability to apply CPU affinity to WQ_SYSFS
 workqueues, thus offloading them from the desired worker CPUs.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Reviewed-by: Tejun Heo t...@kernel.org
 Acked-by: Frederic Weisbecker fweis...@gmail.com
 
 diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
 b/Documentation/kernel-per-CPU-kthreads.txt
 index 827104fb9364..f3cd299fcc41 100644
 --- a/Documentation/kernel-per-CPU-kthreads.txt
 +++ b/Documentation/kernel-per-CPU-kthreads.txt
 @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
  To reduce its OS jitter, do any of the following:
  1.   Run your workload at a real-time priority, which will allow
   preempting the kworker daemons.
 -2.   Do any of the following needed to avoid jitter that your
 +2.   A given workqueue can be made visible in the sysfs filesystem
 + by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
 + Such a workqueue can be confined to a given subset of the
 + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
 + files.  The set of WQ_SYSFS workqueues can be displayed using
 + ls sys/devices/virtual/workqueue.  That said, the workqueues
 + maintainer would like to caution people against indiscriminately
 + sprinkling WQ_SYSFS across all the workqueues.  The reason for
 + caution is that it is easy to add WQ_SYSFS, but because sysfs is
 + part of the formal user/kernel API, it can be nearly impossible
 + to remove it, even if its addition was a mistake.
 +3.   Do any of the following needed to avoid jitter that your
   application cannot tolerate:
   a.  Build your kernel with CONFIG_SLUB=y rather than
   CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
 

Perfect!!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Lai Jiangshan

On 02/13/2014 08:33 AM, Paul E. McKenney wrote:
> On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote:
>> On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
>>> On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
 Hello,

 On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> + to force the WQ_SYSFS workqueues to run on the specified set
> + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> + "ls sys/devices/virtual/workqueue".

 One thing to be careful about is that once published, it becomes part
 of userland visible interface.  Maybe adding some words warning
 against sprinkling WQ_SYSFS willy-nilly is a good idea?
>>>
>>> Good point!  How about the following?
>>>
>>> Thanx, Paul
>>>
>>> 
>>>
>>> Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
>>>
>>> This commit documents the ability to apply CPU affinity to WQ_SYSFS
>>> workqueues, thus offloading them from the desired worker CPUs.
>>>
>>> Signed-off-by: Paul E. McKenney 
>>> Cc: Frederic Weisbecker 
>>> Cc: Tejun Heo 
>>>
>>> diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
>>> b/Documentation/kernel-per-CPU-kthreads.txt
>>> index 827104fb9364..214da3a47a68 100644
>>> --- a/Documentation/kernel-per-CPU-kthreads.txt
>>> +++ b/Documentation/kernel-per-CPU-kthreads.txt
>>> @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
>>>  To reduce its OS jitter, do any of the following:
>>>  1. Run your workload at a real-time priority, which will allow
>>> preempting the kworker daemons.
>>> -2. Do any of the following needed to avoid jitter that your
>>> +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
>>> +   to force the WQ_SYSFS workqueues to run on the specified set
>>> +   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
>>> +   "ls sys/devices/virtual/workqueue".  That said, the workqueues
>>> +   maintainer would like to caution people against indiscriminately
>>> +   sprinkling WQ_SYSFS across all the workqueues.  The reason for
>>> +   caution is that it is easy to add WQ_SYSFS, but because sysfs
>>> +   is part of the formal user/kernel API, it can be nearly impossible
>>> +   to remove it, even if its addition was a mistake.
>>> +3. Do any of the following needed to avoid jitter that your
>>
>> Acked-by: Frederic Weisbecker 
>>
>> I just suggest we append a small explanation about what WQ_SYSFS is about.
>> Like:
> 
> Fair point!  I wordsmithed it into the following.  Seem reasonable?
> 
>   Thanx, Paul
> 
> 
> 
> Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
> 
> This commit documents the ability to apply CPU affinity to WQ_SYSFS
> workqueues, thus offloading them from the desired worker CPUs.
> 
> Signed-off-by: Paul E. McKenney 
> Reviewed-by: Tejun Heo 
> Acked-by: Frederic Weisbecker 
> 
> diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
> b/Documentation/kernel-per-CPU-kthreads.txt
> index 827104fb9364..f3cd299fcc41 100644
> --- a/Documentation/kernel-per-CPU-kthreads.txt
> +++ b/Documentation/kernel-per-CPU-kthreads.txt
> @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
>  To reduce its OS jitter, do any of the following:
>  1.   Run your workload at a real-time priority, which will allow
>   preempting the kworker daemons.
> -2.   Do any of the following needed to avoid jitter that your
> +2.   A given workqueue can be made visible in the sysfs filesystem
> + by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
> + Such a workqueue can be confined to a given subset of the
> + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
> + files.  The set of WQ_SYSFS workqueues can be displayed using
> + "ls sys/devices/virtual/workqueue".  That said, the workqueues
> + maintainer would like to caution people against indiscriminately
> + sprinkling WQ_SYSFS across all the workqueues.  The reason for
> + caution is that it is easy to add WQ_SYSFS, but because sysfs is
> + part of the formal user/kernel API, it can be nearly impossible
> + to remove it, even if its addition was a mistake.
> +3.   Do any of the following needed to avoid jitter that your
>   application cannot tolerate:
>   a.  Build your kernel with CONFIG_SLUB=y rather than
>   CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
> 
> 


Reviewed-by: Lai Jiangshan 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Paul E. McKenney

On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> > > > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> > > > +   to force the WQ_SYSFS workqueues to run on the specified set
> > > > +   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> > > > +   "ls sys/devices/virtual/workqueue".
> > > 
> > > One thing to be careful about is that once published, it becomes part
> > > of userland visible interface.  Maybe adding some words warning
> > > against sprinkling WQ_SYSFS willy-nilly is a good idea?
> > 
> > Good point!  How about the following?
> > 
> > Thanx, Paul
> > 
> > 
> > 
> > Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
> > 
> > This commit documents the ability to apply CPU affinity to WQ_SYSFS
> > workqueues, thus offloading them from the desired worker CPUs.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Frederic Weisbecker 
> > Cc: Tejun Heo 
> > 
> > diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
> > b/Documentation/kernel-per-CPU-kthreads.txt
> > index 827104fb9364..214da3a47a68 100644
> > --- a/Documentation/kernel-per-CPU-kthreads.txt
> > +++ b/Documentation/kernel-per-CPU-kthreads.txt
> > @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
> >  To reduce its OS jitter, do any of the following:
> >  1. Run your workload at a real-time priority, which will allow
> > preempting the kworker daemons.
> > -2. Do any of the following needed to avoid jitter that your
> > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> > +   to force the WQ_SYSFS workqueues to run on the specified set
> > +   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> > +   "ls sys/devices/virtual/workqueue".  That said, the workqueues
> > +   maintainer would like to caution people against indiscriminately
> > +   sprinkling WQ_SYSFS across all the workqueues.  The reason for
> > +   caution is that it is easy to add WQ_SYSFS, but because sysfs
> > +   is part of the formal user/kernel API, it can be nearly impossible
> > +   to remove it, even if its addition was a mistake.
> > +3. Do any of the following needed to avoid jitter that your
> 
> Acked-by: Frederic Weisbecker 
> 
> I just suggest we append a small explanation about what WQ_SYSFS is about.
> Like:

Fair point!  I wordsmithed it into the following.  Seem reasonable?

Thanx, Paul



Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity

This commit documents the ability to apply CPU affinity to WQ_SYSFS
workqueues, thus offloading them from the desired worker CPUs.

Signed-off-by: Paul E. McKenney 
Reviewed-by: Tejun Heo 
Acked-by: Frederic Weisbecker 

diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..f3cd299fcc41 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
 To reduce its OS jitter, do any of the following:
 1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
-2. Do any of the following needed to avoid jitter that your
+2. A given workqueue can be made visible in the sysfs filesystem
+   by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
+   Such a workqueue can be confined to a given subset of the
+   CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
+   files.  The set of WQ_SYSFS workqueues can be displayed using
+   "ls sys/devices/virtual/workqueue".  That said, the workqueues
+   maintainer would like to caution people against indiscriminately
+   sprinkling WQ_SYSFS across all the workqueues.  The reason for
+   caution is that it is easy to add WQ_SYSFS, but because sysfs is
+   part of the formal user/kernel API, it can be nearly impossible
+   to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a.  Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Frederic Weisbecker

On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> > > +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> > > + to force the WQ_SYSFS workqueues to run on the specified set
> > > + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> > > + "ls sys/devices/virtual/workqueue".
> > 
> > One thing to be careful about is that once published, it becomes part
> > of userland visible interface.  Maybe adding some words warning
> > against sprinkling WQ_SYSFS willy-nilly is a good idea?
> 
> Good point!  How about the following?
> 
>   Thanx, Paul
> 
> 
> 
> Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
> 
> This commit documents the ability to apply CPU affinity to WQ_SYSFS
> workqueues, thus offloading them from the desired worker CPUs.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Frederic Weisbecker 
> Cc: Tejun Heo 
> 
> diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
> b/Documentation/kernel-per-CPU-kthreads.txt
> index 827104fb9364..214da3a47a68 100644
> --- a/Documentation/kernel-per-CPU-kthreads.txt
> +++ b/Documentation/kernel-per-CPU-kthreads.txt
> @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
>  To reduce its OS jitter, do any of the following:
>  1.   Run your workload at a real-time priority, which will allow
>   preempting the kworker daemons.
> -2.   Do any of the following needed to avoid jitter that your
> +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> + to force the WQ_SYSFS workqueues to run on the specified set
> + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> + "ls sys/devices/virtual/workqueue".  That said, the workqueues
> + maintainer would like to caution people against indiscriminately
> + sprinkling WQ_SYSFS across all the workqueues.  The reason for
> + caution is that it is easy to add WQ_SYSFS, but because sysfs
> + is part of the formal user/kernel API, it can be nearly impossible
> + to remove it, even if its addition was a mistake.
> +3.   Do any of the following needed to avoid jitter that your

Acked-by: Frederic Weisbecker 

I just suggest we append a small explanation about what WQ_SYSFS is about.
Like:

+2.
+   The workqueues that want to be visible on the sysfs hierarchy
+   set the WQ_SYSFS flag.
+   For those who have this flag set, you can use the
+   /sys/devices/virtual/workqueue/*/cpumask sysfs files
+   to force the workqueues to run on the specified set
+   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
+   "ls sys/devices/virtual/workqueue".  That said, the workqueues
+   maintainer would like to caution people against indiscriminately
+   sprinkling WQ_SYSFS across all the workqueues.  The reason for
+   caution is that it is easy to add WQ_SYSFS, but because sysfs
+   is part of the formal user/kernel API, it can be nearly impossible
+   to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Tejun Heo

On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
> Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
> 
> This commit documents the ability to apply CPU affinity to WQ_SYSFS
> workqueues, thus offloading them from the desired worker CPUs.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Frederic Weisbecker 
> Cc: Tejun Heo 

Reviewed-by: Tejun Heo 

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Paul E. McKenney

On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
> Hello,
> 
> On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> > +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> > +   to force the WQ_SYSFS workqueues to run on the specified set
> > +   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> > +   "ls sys/devices/virtual/workqueue".
> 
> One thing to be careful about is that once published, it becomes part
> of userland visible interface.  Maybe adding some words warning
> against sprinkling WQ_SYSFS willy-nilly is a good idea?

Good point!  How about the following?

Thanx, Paul



Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity

This commit documents the ability to apply CPU affinity to WQ_SYSFS
workqueues, thus offloading them from the desired worker CPUs.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Tejun Heo 

diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..214da3a47a68 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
 To reduce its OS jitter, do any of the following:
 1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
-2. Do any of the following needed to avoid jitter that your
+2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
+   to force the WQ_SYSFS workqueues to run on the specified set
+   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
+   "ls sys/devices/virtual/workqueue".  That said, the workqueues
+   maintainer would like to caution people against indiscriminately
+   sprinkling WQ_SYSFS across all the workqueues.  The reason for
+   caution is that it is easy to add WQ_SYSFS, but because sysfs
+   is part of the formal user/kernel API, it can be nearly impossible
+   to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a.  Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Paul E. McKenney

On Wed, Feb 12, 2014 at 07:23:38PM +0100, Frederic Weisbecker wrote:
> On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
> > > Acked-by: Lai Jiangshan 
> > 
> > Thank you all, queued for 3.15.
> > 
> > We should also have some facility for moving the SRCU workqueues to
> > housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
> > this patch already have that effect?
> 
> Kevin Hilman and me plan to try to bring a new Kconfig option that could let
> us control the unbound workqueues affinity through sysfs.

Please CC me or feel free to update Documentation/kernel-per-CPU-kthreads.txt
as part of this upcoming series.

> The feature actually exist currently but is only enabled for workqueues that
> have WQ_SYSFS. Writeback and raid5 are the only current users.
> 
> See for example: /sys/devices/virtual/workqueue/writeback/cpumask

Ah, news to me!  I have queued the following patch, seem reasonable?

Thanx, Paul



diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..09f28841ee3e 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,11 @@ Purpose: Execute workqueue requests
 To reduce its OS jitter, do any of the following:
 1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
-2. Do any of the following needed to avoid jitter that your
+2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
+   to force the WQ_SYSFS workqueues to run on the specified set
+   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
+   "ls sys/devices/virtual/workqueue".
+3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a.  Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Tejun Heo

Hello,

On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
> +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
> + to force the WQ_SYSFS workqueues to run on the specified set
> + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
> + "ls sys/devices/virtual/workqueue".

One thing to be careful about is that once published, it becomes part
of userland visible interface.  Maybe adding some words warning
against sprinkling WQ_SYSFS willy-nilly is a good idea?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Frederic Weisbecker

On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
> > Acked-by: Lai Jiangshan 
> 
> Thank you all, queued for 3.15.
> 
> We should also have some facility for moving the SRCU workqueues to
> housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
> this patch already have that effect?

Kevin Hilman and me plan to try to bring a new Kconfig option that could let
us control the unbound workqueues affinity through sysfs.

The feature actually exist currently but is only enabled for workqueues that
have WQ_SYSFS. Writeback and raid5 are the only current users.

See for example: /sys/devices/virtual/workqueue/writeback/cpumask
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Frederic Weisbecker

On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
 On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
  Acked-by: Lai Jiangshan la...@cn.fujitsu.com
 
 Thank you all, queued for 3.15.
 
 We should also have some facility for moving the SRCU workqueues to
 housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
 this patch already have that effect?

Kevin Hilman and me plan to try to bring a new Kconfig option that could let
us control the unbound workqueues affinity through sysfs.

The feature actually exist currently but is only enabled for workqueues that
have WQ_SYSFS. Writeback and raid5 are the only current users.

See for example: /sys/devices/virtual/workqueue/writeback/cpumask
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Tejun Heo

Hello,

On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
 +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
 + to force the WQ_SYSFS workqueues to run on the specified set
 + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
 + ls sys/devices/virtual/workqueue.

One thing to be careful about is that once published, it becomes part
of userland visible interface.  Maybe adding some words warning
against sprinkling WQ_SYSFS willy-nilly is a good idea?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Paul E. McKenney

On Wed, Feb 12, 2014 at 07:23:38PM +0100, Frederic Weisbecker wrote:
 On Mon, Feb 10, 2014 at 10:47:29AM -0800, Paul E. McKenney wrote:
  On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
   Acked-by: Lai Jiangshan la...@cn.fujitsu.com
  
  Thank you all, queued for 3.15.
  
  We should also have some facility for moving the SRCU workqueues to
  housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
  this patch already have that effect?
 
 Kevin Hilman and me plan to try to bring a new Kconfig option that could let
 us control the unbound workqueues affinity through sysfs.

Please CC me or feel free to update Documentation/kernel-per-CPU-kthreads.txt
as part of this upcoming series.

 The feature actually exist currently but is only enabled for workqueues that
 have WQ_SYSFS. Writeback and raid5 are the only current users.
 
 See for example: /sys/devices/virtual/workqueue/writeback/cpumask

Ah, news to me!  I have queued the following patch, seem reasonable?

Thanx, Paul



diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..09f28841ee3e 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,11 @@ Purpose: Execute workqueue requests
 To reduce its OS jitter, do any of the following:
 1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
-2. Do any of the following needed to avoid jitter that your
+2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
+   to force the WQ_SYSFS workqueues to run on the specified set
+   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
+   ls sys/devices/virtual/workqueue.
+3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a.  Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Paul E. McKenney

On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
 Hello,
 
 On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
  +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
  +   to force the WQ_SYSFS workqueues to run on the specified set
  +   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
  +   ls sys/devices/virtual/workqueue.
 
 One thing to be careful about is that once published, it becomes part
 of userland visible interface.  Maybe adding some words warning
 against sprinkling WQ_SYSFS willy-nilly is a good idea?

Good point!  How about the following?

Thanx, Paul



Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity

This commit documents the ability to apply CPU affinity to WQ_SYSFS
workqueues, thus offloading them from the desired worker CPUs.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tejun Heo t...@kernel.org

diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..214da3a47a68 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
 To reduce its OS jitter, do any of the following:
 1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
-2. Do any of the following needed to avoid jitter that your
+2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
+   to force the WQ_SYSFS workqueues to run on the specified set
+   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
+   ls sys/devices/virtual/workqueue.  That said, the workqueues
+   maintainer would like to caution people against indiscriminately
+   sprinkling WQ_SYSFS across all the workqueues.  The reason for
+   caution is that it is easy to add WQ_SYSFS, but because sysfs
+   is part of the formal user/kernel API, it can be nearly impossible
+   to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a.  Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Tejun Heo

On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
 Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity

 This commit documents the ability to apply CPU affinity to WQ_SYSFS
 workqueues, thus offloading them from the desired worker CPUs.

 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Tejun Heo t...@kernel.org

Reviewed-by: Tejun Heo t...@kernel.org

Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Frederic Weisbecker

On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
 On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
  Hello,
  
  On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
   +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
   + to force the WQ_SYSFS workqueues to run on the specified set
   + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
   + ls sys/devices/virtual/workqueue.
  
  One thing to be careful about is that once published, it becomes part
  of userland visible interface.  Maybe adding some words warning
  against sprinkling WQ_SYSFS willy-nilly is a good idea?
 
 Good point!  How about the following?
 
   Thanx, Paul
 
 
 
 Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
 
 This commit documents the ability to apply CPU affinity to WQ_SYSFS
 workqueues, thus offloading them from the desired worker CPUs.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Tejun Heo t...@kernel.org
 
 diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
 b/Documentation/kernel-per-CPU-kthreads.txt
 index 827104fb9364..214da3a47a68 100644
 --- a/Documentation/kernel-per-CPU-kthreads.txt
 +++ b/Documentation/kernel-per-CPU-kthreads.txt
 @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
  To reduce its OS jitter, do any of the following:
  1.   Run your workload at a real-time priority, which will allow
   preempting the kworker daemons.
 -2.   Do any of the following needed to avoid jitter that your
 +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
 + to force the WQ_SYSFS workqueues to run on the specified set
 + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
 + ls sys/devices/virtual/workqueue.  That said, the workqueues
 + maintainer would like to caution people against indiscriminately
 + sprinkling WQ_SYSFS across all the workqueues.  The reason for
 + caution is that it is easy to add WQ_SYSFS, but because sysfs
 + is part of the formal user/kernel API, it can be nearly impossible
 + to remove it, even if its addition was a mistake.
 +3.   Do any of the following needed to avoid jitter that your

Acked-by: Frederic Weisbecker fweis...@gmail.com

I just suggest we append a small explanation about what WQ_SYSFS is about.
Like:

+2.
+   The workqueues that want to be visible on the sysfs hierarchy
+   set the WQ_SYSFS flag.
+   For those who have this flag set, you can use the
+   /sys/devices/virtual/workqueue/*/cpumask sysfs files
+   to force the workqueues to run on the specified set
+   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
+   ls sys/devices/virtual/workqueue.  That said, the workqueues
+   maintainer would like to caution people against indiscriminately
+   sprinkling WQ_SYSFS across all the workqueues.  The reason for
+   caution is that it is easy to add WQ_SYSFS, but because sysfs
+   is part of the formal user/kernel API, it can be nearly impossible
+   to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Paul E. McKenney

On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote:
 On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
  On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
   Hello,
   
   On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
+2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
+   to force the WQ_SYSFS workqueues to run on the specified set
+   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
+   ls sys/devices/virtual/workqueue.
   
   One thing to be careful about is that once published, it becomes part
   of userland visible interface.  Maybe adding some words warning
   against sprinkling WQ_SYSFS willy-nilly is a good idea?
  
  Good point!  How about the following?
  
  Thanx, Paul
  
  
  
  Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
  
  This commit documents the ability to apply CPU affinity to WQ_SYSFS
  workqueues, thus offloading them from the desired worker CPUs.
  
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
  Cc: Frederic Weisbecker fweis...@gmail.com
  Cc: Tejun Heo t...@kernel.org
  
  diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
  b/Documentation/kernel-per-CPU-kthreads.txt
  index 827104fb9364..214da3a47a68 100644
  --- a/Documentation/kernel-per-CPU-kthreads.txt
  +++ b/Documentation/kernel-per-CPU-kthreads.txt
  @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
   To reduce its OS jitter, do any of the following:
   1. Run your workload at a real-time priority, which will allow
  preempting the kworker daemons.
  -2. Do any of the following needed to avoid jitter that your
  +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
  +   to force the WQ_SYSFS workqueues to run on the specified set
  +   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
  +   ls sys/devices/virtual/workqueue.  That said, the workqueues
  +   maintainer would like to caution people against indiscriminately
  +   sprinkling WQ_SYSFS across all the workqueues.  The reason for
  +   caution is that it is easy to add WQ_SYSFS, but because sysfs
  +   is part of the formal user/kernel API, it can be nearly impossible
  +   to remove it, even if its addition was a mistake.
  +3. Do any of the following needed to avoid jitter that your
 
 Acked-by: Frederic Weisbecker fweis...@gmail.com
 
 I just suggest we append a small explanation about what WQ_SYSFS is about.
 Like:

Fair point!  I wordsmithed it into the following.  Seem reasonable?

Thanx, Paul



Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity

This commit documents the ability to apply CPU affinity to WQ_SYSFS
workqueues, thus offloading them from the desired worker CPUs.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Reviewed-by: Tejun Heo t...@kernel.org
Acked-by: Frederic Weisbecker fweis...@gmail.com

diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..f3cd299fcc41 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
 To reduce its OS jitter, do any of the following:
 1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
-2. Do any of the following needed to avoid jitter that your
+2. A given workqueue can be made visible in the sysfs filesystem
+   by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
+   Such a workqueue can be confined to a given subset of the
+   CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
+   files.  The set of WQ_SYSFS workqueues can be displayed using
+   ls sys/devices/virtual/workqueue.  That said, the workqueues
+   maintainer would like to caution people against indiscriminately
+   sprinkling WQ_SYSFS across all the workqueues.  The reason for
+   caution is that it is easy to add WQ_SYSFS, but because sysfs is
+   part of the formal user/kernel API, it can be nearly impossible
+   to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a.  Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-12 Thread Lai Jiangshan

On 02/13/2014 08:33 AM, Paul E. McKenney wrote:
 On Thu, Feb 13, 2014 at 12:04:57AM +0100, Frederic Weisbecker wrote:
 On Wed, Feb 12, 2014 at 11:59:22AM -0800, Paul E. McKenney wrote:
 On Wed, Feb 12, 2014 at 02:23:54PM -0500, Tejun Heo wrote:
 Hello,

 On Wed, Feb 12, 2014 at 11:02:41AM -0800, Paul E. McKenney wrote:
 +2.   Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
 + to force the WQ_SYSFS workqueues to run on the specified set
 + of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
 + ls sys/devices/virtual/workqueue.

 One thing to be careful about is that once published, it becomes part
 of userland visible interface.  Maybe adding some words warning
 against sprinkling WQ_SYSFS willy-nilly is a good idea?

 Good point!  How about the following?

 Thanx, Paul

 

 Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity

 This commit documents the ability to apply CPU affinity to WQ_SYSFS
 workqueues, thus offloading them from the desired worker CPUs.

 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: Tejun Heo t...@kernel.org

 diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
 b/Documentation/kernel-per-CPU-kthreads.txt
 index 827104fb9364..214da3a47a68 100644
 --- a/Documentation/kernel-per-CPU-kthreads.txt
 +++ b/Documentation/kernel-per-CPU-kthreads.txt
 @@ -162,7 +162,16 @@ Purpose: Execute workqueue requests
  To reduce its OS jitter, do any of the following:
  1. Run your workload at a real-time priority, which will allow
 preempting the kworker daemons.
 -2. Do any of the following needed to avoid jitter that your
 +2. Use the /sys/devices/virtual/workqueue/*/cpumask sysfs files
 +   to force the WQ_SYSFS workqueues to run on the specified set
 +   of CPUs.  The set of WQ_SYSFS workqueues can be displayed using
 +   ls sys/devices/virtual/workqueue.  That said, the workqueues
 +   maintainer would like to caution people against indiscriminately
 +   sprinkling WQ_SYSFS across all the workqueues.  The reason for
 +   caution is that it is easy to add WQ_SYSFS, but because sysfs
 +   is part of the formal user/kernel API, it can be nearly impossible
 +   to remove it, even if its addition was a mistake.
 +3. Do any of the following needed to avoid jitter that your

 Acked-by: Frederic Weisbecker fweis...@gmail.com

 I just suggest we append a small explanation about what WQ_SYSFS is about.
 Like:
 
 Fair point!  I wordsmithed it into the following.  Seem reasonable?
 
   Thanx, Paul
 
 
 
 Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity
 
 This commit documents the ability to apply CPU affinity to WQ_SYSFS
 workqueues, thus offloading them from the desired worker CPUs.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Reviewed-by: Tejun Heo t...@kernel.org
 Acked-by: Frederic Weisbecker fweis...@gmail.com
 
 diff --git a/Documentation/kernel-per-CPU-kthreads.txt 
 b/Documentation/kernel-per-CPU-kthreads.txt
 index 827104fb9364..f3cd299fcc41 100644
 --- a/Documentation/kernel-per-CPU-kthreads.txt
 +++ b/Documentation/kernel-per-CPU-kthreads.txt
 @@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
  To reduce its OS jitter, do any of the following:
  1.   Run your workload at a real-time priority, which will allow
   preempting the kworker daemons.
 -2.   Do any of the following needed to avoid jitter that your
 +2.   A given workqueue can be made visible in the sysfs filesystem
 + by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
 + Such a workqueue can be confined to a given subset of the
 + CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
 + files.  The set of WQ_SYSFS workqueues can be displayed using
 + ls sys/devices/virtual/workqueue.  That said, the workqueues
 + maintainer would like to caution people against indiscriminately
 + sprinkling WQ_SYSFS across all the workqueues.  The reason for
 + caution is that it is easy to add WQ_SYSFS, but because sysfs is
 + part of the formal user/kernel API, it can be nearly impossible
 + to remove it, even if its addition was a mistake.
 +3.   Do any of the following needed to avoid jitter that your
   application cannot tolerate:
   a.  Build your kernel with CONFIG_SLUB=y rather than
   CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
 
 


Reviewed-by: Lai Jiangshan la...@cn.fujitsu.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-10 Thread Paul E. McKenney

On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
> Acked-by: Lai Jiangshan 

Thank you all, queued for 3.15.

We should also have some facility for moving the SRCU workqueues to
housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
this patch already have that effect?

Thanx, Paul

> On 02/01/2014 03:53 AM, Zoran Markovic wrote:
> > From: Shaibal Dutta 
> > 
> > For better use of CPU idle time, allow the scheduler to select the CPU
> > on which the SRCU grace period work would be scheduled. This improves
> > idle residency time and conserves power.
> > 
> > This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.
> > 
> > Cc: Lai Jiangshan 
> > Cc: "Paul E. McKenney" 
> > Cc: Dipankar Sarma 
> > Signed-off-by: Shaibal Dutta 
> > [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit
> > message. Fixed code alignment.]
> > ---
> >  kernel/rcu/srcu.c |5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> > index 3318d82..a1ebe6d 100644
> > --- a/kernel/rcu/srcu.c
> > +++ b/kernel/rcu/srcu.c
> > @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head 
> > *head,
> > rcu_batch_queue(>batch_queue, head);
> > if (!sp->running) {
> > sp->running = true;
> > -   schedule_delayed_work(>work, 0);
> > +   queue_delayed_work(system_power_efficient_wq, >work, 0);
> > }
> > spin_unlock_irqrestore(>queue_lock, flags);
> >  }
> > @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp)
> > }
> >  
> > if (pending)
> > -   schedule_delayed_work(>work, SRCU_INTERVAL);
> > +   queue_delayed_work(system_power_efficient_wq,
> > +  >work, SRCU_INTERVAL);
> >  }
> >  
> >  /*
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-10 Thread Lai Jiangshan

Acked-by: Lai Jiangshan 

On 02/01/2014 03:53 AM, Zoran Markovic wrote:
> From: Shaibal Dutta 
> 
> For better use of CPU idle time, allow the scheduler to select the CPU
> on which the SRCU grace period work would be scheduled. This improves
> idle residency time and conserves power.
> 
> This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.
> 
> Cc: Lai Jiangshan 
> Cc: "Paul E. McKenney" 
> Cc: Dipankar Sarma 
> Signed-off-by: Shaibal Dutta 
> [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit
> message. Fixed code alignment.]
> ---
>  kernel/rcu/srcu.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> index 3318d82..a1ebe6d 100644
> --- a/kernel/rcu/srcu.c
> +++ b/kernel/rcu/srcu.c
> @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head 
> *head,
>   rcu_batch_queue(>batch_queue, head);
>   if (!sp->running) {
>   sp->running = true;
> - schedule_delayed_work(>work, 0);
> + queue_delayed_work(system_power_efficient_wq, >work, 0);
>   }
>   spin_unlock_irqrestore(>queue_lock, flags);
>  }
> @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp)
>   }
>  
>   if (pending)
> - schedule_delayed_work(>work, SRCU_INTERVAL);
> + queue_delayed_work(system_power_efficient_wq,
> +>work, SRCU_INTERVAL);
>  }
>  
>  /*

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-10 Thread Lai Jiangshan

Acked-by: Lai Jiangshan la...@cn.fujitsu.com

On 02/01/2014 03:53 AM, Zoran Markovic wrote:
 From: Shaibal Dutta shaibal.du...@broadcom.com
 
 For better use of CPU idle time, allow the scheduler to select the CPU
 on which the SRCU grace period work would be scheduled. This improves
 idle residency time and conserves power.
 
 This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.
 
 Cc: Lai Jiangshan la...@cn.fujitsu.com
 Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Dipankar Sarma dipan...@in.ibm.com
 Signed-off-by: Shaibal Dutta shaibal.du...@broadcom.com
 [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit
 message. Fixed code alignment.]
 ---
  kernel/rcu/srcu.c |5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
 index 3318d82..a1ebe6d 100644
 --- a/kernel/rcu/srcu.c
 +++ b/kernel/rcu/srcu.c
 @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head 
 *head,
   rcu_batch_queue(sp-batch_queue, head);
   if (!sp-running) {
   sp-running = true;
 - schedule_delayed_work(sp-work, 0);
 + queue_delayed_work(system_power_efficient_wq, sp-work, 0);
   }
   spin_unlock_irqrestore(sp-queue_lock, flags);
  }
 @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp)
   }
  
   if (pending)
 - schedule_delayed_work(sp-work, SRCU_INTERVAL);
 + queue_delayed_work(system_power_efficient_wq,
 +sp-work, SRCU_INTERVAL);
  }
  
  /*

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-02-10 Thread Paul E. McKenney

On Mon, Feb 10, 2014 at 06:08:31PM +0800, Lai Jiangshan wrote:
 Acked-by: Lai Jiangshan la...@cn.fujitsu.com

Thank you all, queued for 3.15.

We should also have some facility for moving the SRCU workqueues to
housekeeping/timekeeping kthreads in the NO_HZ_FULL case.  Or does
this patch already have that effect?

Thanx, Paul

 On 02/01/2014 03:53 AM, Zoran Markovic wrote:
  From: Shaibal Dutta shaibal.du...@broadcom.com
  
  For better use of CPU idle time, allow the scheduler to select the CPU
  on which the SRCU grace period work would be scheduled. This improves
  idle residency time and conserves power.
  
  This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.
  
  Cc: Lai Jiangshan la...@cn.fujitsu.com
  Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
  Cc: Dipankar Sarma dipan...@in.ibm.com
  Signed-off-by: Shaibal Dutta shaibal.du...@broadcom.com
  [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit
  message. Fixed code alignment.]
  ---
   kernel/rcu/srcu.c |5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)
  
  diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
  index 3318d82..a1ebe6d 100644
  --- a/kernel/rcu/srcu.c
  +++ b/kernel/rcu/srcu.c
  @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head 
  *head,
  rcu_batch_queue(sp-batch_queue, head);
  if (!sp-running) {
  sp-running = true;
  -   schedule_delayed_work(sp-work, 0);
  +   queue_delayed_work(system_power_efficient_wq, sp-work, 0);
  }
  spin_unlock_irqrestore(sp-queue_lock, flags);
   }
  @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp)
  }
   
  if (pending)
  -   schedule_delayed_work(sp-work, SRCU_INTERVAL);
  +   queue_delayed_work(system_power_efficient_wq,
  +  sp-work, SRCU_INTERVAL);
   }
   
   /*
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-01-31 Thread Zoran Markovic

Signed-off-by: Zoran Markovic 

On 31 January 2014 11:53, Zoran Markovic  wrote:
> From: Shaibal Dutta 
>
> For better use of CPU idle time, allow the scheduler to select the CPU
> on which the SRCU grace period work would be scheduled. This improves
> idle residency time and conserves power.
>
> This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.
>
> Cc: Lai Jiangshan 
> Cc: "Paul E. McKenney" 
> Cc: Dipankar Sarma 
> Signed-off-by: Shaibal Dutta 
> [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit
> message. Fixed code alignment.]
> ---
>  kernel/rcu/srcu.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> index 3318d82..a1ebe6d 100644
> --- a/kernel/rcu/srcu.c
> +++ b/kernel/rcu/srcu.c
> @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head 
> *head,
> rcu_batch_queue(>batch_queue, head);
> if (!sp->running) {
> sp->running = true;
> -   schedule_delayed_work(>work, 0);
> +   queue_delayed_work(system_power_efficient_wq, >work, 0);
> }
> spin_unlock_irqrestore(>queue_lock, flags);
>  }
> @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp)
> }
>
> if (pending)
> -   schedule_delayed_work(>work, SRCU_INTERVAL);
> +   queue_delayed_work(system_power_efficient_wq,
> +  >work, SRCU_INTERVAL);
>  }
>
>  /*
> --
> 1.7.9.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] rcu: move SRCU grace period work to power efficient workqueue

2014-01-31 Thread Zoran Markovic

Signed-off-by: Zoran Markovic zoran.marko...@linaro.org

On 31 January 2014 11:53, Zoran Markovic zoran.marko...@linaro.org wrote:
 From: Shaibal Dutta shaibal.du...@broadcom.com

 For better use of CPU idle time, allow the scheduler to select the CPU
 on which the SRCU grace period work would be scheduled. This improves
 idle residency time and conserves power.

 This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected.

 Cc: Lai Jiangshan la...@cn.fujitsu.com
 Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Dipankar Sarma dipan...@in.ibm.com
 Signed-off-by: Shaibal Dutta shaibal.du...@broadcom.com
 [zoran.marko...@linaro.org: Rebased to latest kernel version. Added commit
 message. Fixed code alignment.]
 ---
  kernel/rcu/srcu.c |5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

 diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
 index 3318d82..a1ebe6d 100644
 --- a/kernel/rcu/srcu.c
 +++ b/kernel/rcu/srcu.c
 @@ -398,7 +398,7 @@ void call_srcu(struct srcu_struct *sp, struct rcu_head 
 *head,
 rcu_batch_queue(sp-batch_queue, head);
 if (!sp-running) {
 sp-running = true;
 -   schedule_delayed_work(sp-work, 0);
 +   queue_delayed_work(system_power_efficient_wq, sp-work, 0);
 }
 spin_unlock_irqrestore(sp-queue_lock, flags);
  }
 @@ -674,7 +674,8 @@ static void srcu_reschedule(struct srcu_struct *sp)
 }

 if (pending)
 -   schedule_delayed_work(sp-work, SRCU_INTERVAL);
 +   queue_delayed_work(system_power_efficient_wq,
 +  sp-work, SRCU_INTERVAL);
  }

  /*
 --
 1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

48 matches

Mail list logo