date:20130726

On Sat, Jul 27, 2013 at 12:57:03PM +1000, Dave Chinner wrote:
> On Fri, Jul 26, 2013 at 04:28:52PM -0700, Paul E. McKenney wrote:
> > Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
> > amazingly long NMI handlers from a trinity-induced workload involving
> > lots of concurrent sync() calls (https://lkml.org/lkml/2013/7/23/369).
> > There are any number of things that one might do to make sync() behave
> > better under high levels of contention, but it is also the case that
> > multiple concurrent sync() system calls can be satisfied by a single
> > sys_sync() invocation.
> > 
> > Given that this situation is reminiscent of rcu_barrier(), this commit
> > applies the rcu_barrier() approach to sys_sync().  This approach uses
> > a global mutex and a sequence counter.  The mutex is held across the
> > sync() operation, which eliminates contention between concurrent sync()
> > operations.
> >
> > The counter is incremented at the beginning and end of
> > each sync() operation, so that it is odd while a sync() operation is in
> > progress and even otherwise, just like sequence locks.
> > 
> > The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
> > now handles the concurrency.  The sys_sync() function first takes a
> > snapshot of the counter, then acquires the mutex, and then takes another
> > snapshot of the counter.  If the values of the two snapshots indicate that
> > a full do_sync() executed during the mutex acquisition, the sys_sync()
> > function releases the mutex and returns ("Our work is done!").  Otherwise,
> > sys_sync() increments the counter, invokes do_sync(), and increments
> > the counter again.
> > 
> > This approach allows a single call to do_sync() to satisfy an arbitrarily
> > large number of sync() system calls, which should eliminate issues due
> > to large numbers of concurrent invocations of the sync() system call.
> 
> This is not addressing the problem that is causing issues during
> sync. Indeed, it only puts a bandaid over the currently observed
> trigger.
> 
> Indeed, i suspect that this will significantly slow down concurrent
> sync operations, as it serialised sync across all superblocks rather
> than serialising per-superblock like is currently done. Indeed, that
> per-superblock serialisation is where all the lock contention
> problems are. And it's not sync alone that causes the contention
> problems - it has to be combined with other concurrent workloads
> that add or remove inodes from the inode cache at tha same time.

Seems like something along the lines of wakeup_flusher_threads()
currently at the start of sys_sync() would address this.

> I have patches to address that by removing the source
> of the lock contention completely, and not just for the sys_sync
> trigger. Those patches make the problems with concurrent
> sys_sync operation go away completely for me, not to mention improve
> performance for 8+ thread metadata workloads on XFS significantly.
> 
> IOWs, I don't see that concurrent sys_sync operation is a problem at
> all, and it is actively desirable for systems that have multiple
> busy filesystems as it allows concurrent dispatch of IO across those
> multiple filesystems. Serialising all sys_sync work might stop the
> contention problems, but it will also slow down concurrent sync
> operations on busy systems as it only allows one thread to dispatch
> and wait for IO at a time.
> 
> So, let's not slap a bandaid over a symptom - let's address the
> cause of the lock contention properly

Hmmm...

Could you please send your patches over to Dave Jones right now?  I am
getting quite tired of getting RCU CPU stall warning complaints from
him that turn out to be due to highly contended sync() system calls.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] cpuset: correct the disoder comment of two functions

2013-07-26 Thread Zhao Hongjiang

correct the disoder comment between cpuset_css_offline() and
cpuset_css_free() functions.

Signed-off-by: Zhao Hongjiang 
---
 kernel/cpuset.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 2ddd9b9..703bfd5 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2020,6 +2020,12 @@ out_unlock:
return 0;
 }
 
+/*
+ * If the cpuset being removed has its flag 'sched_load_balance'
+ * enabled, then simulate turning sched_load_balance off, which
+ * will call rebuild_sched_domains_locked().
+ */
+
 static void cpuset_css_offline(struct cgroup *cgrp)
 {
struct cpuset *cs = cgroup_cs(cgrp);
@@ -2035,12 +2041,6 @@ static void cpuset_css_offline(struct cgroup *cgrp)
mutex_unlock(_mutex);
 }
 
-/*
- * If the cpuset being removed has its flag 'sched_load_balance'
- * enabled, then simulate turning sched_load_balance off, which
- * will call rebuild_sched_domains_locked().
- */
-
 static void cpuset_css_free(struct cgroup *cgrp)
 {
struct cpuset *cs = cgroup_cs(cgrp);
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cpuset: get rid of the useless forward declaration of cpuset

2013-07-26 Thread Zhao Hongjiang

get rid of the useless forward declaration of the struct cpuset cause the 
below define it.

Signed-off-by: Zhao Hongjiang 
---
 kernel/cpuset.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index e565778..2ddd9b9 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -70,7 +70,6 @@ int number_of_cpusets __read_mostly;
 
 /* Forward declare cgroup structures */
 struct cgroup_subsys cpuset_subsys;
-struct cpuset;
 
 /* See "Frequency meter" comments, below. */
 
-- 
1.8.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/2] tracing: Shrink the size of struct ftrace_event_field

2013-07-26 Thread Steven Rostedt

On Sat, 2013-07-27 at 11:32 +0800, Li Zefan wrote:
 
>  struct event_filter {
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 7d85429..d72694d 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -106,6 +106,9 @@ trace_find_event_field(struct ftrace_event_call *call, 
> char *name)
>   return __find_event_field(head, name);
>  }
>  
> +/* detect bit-field overflow */
> +#define VERIFY_SIZE(type) WARN_ON(type > field->type)
> +

One small nit. Move this macro definition into the function itself,
right above the macro usage. That way it will be much easier to review
in the future, as people don't need to go search for VERIFY_SIZE(). It
will be right there with the usage. The aesthetics may be a bit off, but
at least the code will be obvious at first glance at what is happening,
and I think that's more important than the "look" of the code.

Also, it will be obvious that the variable name must match the field
name.

Thanks,

-- Steve

>  static int __trace_define_field(struct list_head *head, const char *type,
>   const char *name, int offset, int size,
>   int is_signed, int filter_type)
> @@ -120,13 +123,16 @@ static int __trace_define_field(struct list_head *head, 
> const char *type,
>   field->type = type;
>  
>   if (filter_type == FILTER_OTHER)
> - field->filter_type = filter_assign_type(type);
> - else
> - field->filter_type = filter_type;
> + filter_type = filter_assign_type(type);
>  
> + field->filter_type = filter_type;
>   field->offset = offset;
>   field->size = size;
> - field->is_signed = is_signed;
> + field->is_signed = !!is_signed;
> +
> + VERIFY_SIZE(filter_type);
> + VERIFY_SIZE(offset);
> + VERIFY_SIZE(size);
>  
>   list_add(>link, head);
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/2] tracing: Shrink the size of struct ftrace_event_field

2013-07-26 Thread Li Zefan

Use bit fields, and the size of struct ftrace_event_field can be
shrunk from 48 bytes to 40 bytes on 64bit kernel.

slab_name active_obj nr_obj size obj_per_slab
-
ftrace_event_field   1105   1105 48   85  (before)
ftrace_event_field   1224   1224 40  102  (after)

This saves a few Kbytes: (1224 * 40) - (1105 * 48) = 4080

v2:
- use !!is_signed, and nuke the check on this field.
- use a different way to detect overflow.
(both suggested by Steven)

Signed-off-by: Li Zefan 
---
 kernel/trace/trace.h|  8 
 kernel/trace/trace_events.c | 14 ++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4a4f6e1..3e8c97f 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -904,10 +904,10 @@ struct ftrace_event_field {
struct list_headlink;
const char  *name;
const char  *type;
-   int filter_type;
-   int offset;
-   int size;
-   int is_signed;
+   unsigned intfilter_type:4;
+   unsigned intoffset:12;
+   unsigned intsize:12;
+   unsigned intis_signed:1;
 };
 
 struct event_filter {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7d85429..d72694d 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -106,6 +106,9 @@ trace_find_event_field(struct ftrace_event_call *call, char 
*name)
return __find_event_field(head, name);
 }
 
+/* detect bit-field overflow */
+#define VERIFY_SIZE(type) WARN_ON(type > field->type)
+
 static int __trace_define_field(struct list_head *head, const char *type,
const char *name, int offset, int size,
int is_signed, int filter_type)
@@ -120,13 +123,16 @@ static int __trace_define_field(struct list_head *head, 
const char *type,
field->type = type;
 
if (filter_type == FILTER_OTHER)
-   field->filter_type = filter_assign_type(type);
-   else
-   field->filter_type = filter_type;
+   filter_type = filter_assign_type(type);
 
+   field->filter_type = filter_type;
field->offset = offset;
field->size = size;
-   field->is_signed = is_signed;
+   field->is_signed = !!is_signed;
+
+   VERIFY_SIZE(filter_type);
+   VERIFY_SIZE(offset);
+   VERIFY_SIZE(size);
 
list_add(>link, head);
 
-- 
1.8.0.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:perf/core] perf: Update perf_event_type documentation

2013-07-26 Thread Vince Weaver

On Fri, 26 Jul 2013, Peter Zijlstra wrote:

> On Thu, Jul 25, 2013 at 11:20:24PM -0400, Vince Weaver wrote:
> > 
> > a thing that personally bothers me are these imaginary struct definitions 
> > added as part of the documentation that aren't actually available in the 
> > public perf_event.h
> > 
> > I can see why it's done, but it can be confusing picking out in later 
> > definitions which struct fields are real and which ones are conceptual.
> 
> Would it help if we changed the syntax to not look as much as real C
> would?

I've been thinking and I can't really think of a clearer way to present 
the layout.   So I guess it's fine the way it is.  Hopefully not many 
people are stuck having to implement code based on header file comments 
anyway.

> > It might be clearer
> > if you stuck the perf_event_attr::sample_id_all qualifier each
> > place you add the sample_id field.
> 
> Ah, I actually considered that but then got lazy and used the 0 sized
> struct idea :/

It might just be me.  For whatever reason the C parser in my head doesn't 
handle GNU extensions like 0-sized structs.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2][RESEND] tracing: Shrink the size of struct ftrace_event_field

2013-07-26 Thread Li Zefan

>> @@ -111,6 +111,11 @@ static int __trace_define_field(struct list_head *head, 
>> const char *type,
>>  field->size = size;
>>  field->is_signed = is_signed;
> 
> I think we should just change is_signed to bool. At least the parameter.
> Or we can make the assignment: field->is_signed = !!is_signed; and nuke
> the check below.
> 

Changing field->is_signed to bool won't shrink the size of the struct.
I prefer: field->is_signed = !!is_signed.

>>  
>> +WARN_ON(offset >= (1 << 12));
>> +WARN_ON(size >= (1 << 12));
>> +WARN_ON(is_signed >= (1 << 1));
>> +WARN_ON(field->filter_type >= (1 << 4));
> 
> Note, the test for field->filter_type is wrong.
> 

oops.

> We should make a helper macro:
> 
> #define VERIFY_SIZE(type) WARN_ON(type > field->type)
> 

Much better!

> and then have:
> 
>   VERIFY_SIZE(offset);
>   VERIFY_SIZE(size);
>   VERIFY_SIZE(filter_type);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/16] usb: musb: dsps: add MUSB_DEVCTL_SESSION back after removal

2013-07-26 Thread Bin Liu

Sebastian,

On Fri, Jul 26, 2013 at 3:35 PM, Sebastian Andrzej Siewior
 wrote:
>> My build server is down this afternoon. Once it comes back next week,
>> I will try 3.8 again, to see how I can help on this USB1 host mode
>> issue. its devctl register should stay on 0x19 even nothing is
>> connected.
>
> Your help is greatly appreciated. To hear what will happen :)
>
> Sebastian

I have not tested it yet, but I believe I found why host mode works on
TI 3.2 kernel but not on mainline. Please look at Line 786 in 3.2
kernel musb_core.c [1].

773 if ((int_usb & MUSB_INTR_DISCONNECT) && !musb->ignore_disconnect) {
..
785 if (musb->a_wait_bcon != 0 &&
786 is_otg_enabled(musb))
787 musb_platform_try_idle(musb, jiffies
788 +
msecs_to_jiffies(musb->a_wait_bcon));

So when the device is unplugged, *_try_idle() is not called in host
mode, then the SESSION bit will stay set. But in mainline kernel,
*_try_idle() will be called regardless.

Please let me know your thoughts.

Regards,
-Bin.

[1] 
http://arago-project.org/git/projects/?p=linux-am33x.git;a=blob;f=drivers/usb/musb/musb_core.c;h=075aa5f9bec7cd041aa24eb534209fa756ed84fe;hb=refs/heads/v3.2-staging
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] tile / cpu topology: remove stale Macro arch_provides_topology_pointers

2013-07-26 Thread Hanjun Guo

Macro arch_provides_topology_pointers is pointless now, remove it.

Signed-off-by: Hanjun Guo 
---
 arch/tile/include/asm/topology.h |3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index d5e86c9..d15c0d8 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -89,9 +89,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
 #define topology_core_id(cpu)   (cpu)
 #define topology_core_cpumask(cpu)  ((void)(cpu), cpu_online_mask)
 #define topology_thread_cpumask(cpu)cpumask_of(cpu)
-
-/* indicates that pointers to the topology struct cpumask maps are valid */
-#define arch_provides_topology_pointers yes
 #endif
 
 #endif /* _ASM_TILE_TOPOLOGY_H */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] x86 / cpu topology: remove the stale macro arch_provides_topology_pointers

2013-07-26 Thread Hanjun Guo

Macro arch_provides_topology_pointers is pointless now, remove it.

Signed-off-by: Hanjun Guo 
---
 arch/x86/include/asm/topology.h |3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 095b215..d35f24e 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -124,9 +124,6 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
 #define topology_core_id(cpu)  (cpu_data(cpu).cpu_core_id)
 #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
 #define topology_thread_cpumask(cpu)   (per_cpu(cpu_sibling_map, cpu))
-
-/* indicates that pointers to the topology cpumask_t maps are valid */
-#define arch_provides_topology_pointersyes
 #endif
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] cpu topology: remove stale arch_provides_topology_pointers and define_siblings_show_map/list()

2013-07-26 Thread Hanjun Guo

arch_provides_topology_pointers was introduced in commit 23ca4bba3 (x86:
cleanup early per cpu variables/accesses v4) to indicate pointers to the
topology cpumask_t maps are valid to avoid copying data on to/off of the
stack.

But later in commit fbd59a8d (cpumask: Use topology_core_cpumask()/
topology_thread_cpumask()), the pointers to the topology struct cpumask maps
are always valid.

After that commit, the only difference is that there is a redundant
"unsigned int cpu = dev->id;" if arch_provides_topology_pointers defined, but
dev->id is type 'u32' which devolves to 'unsigned int' on all supported arches.
So this arch_provides_topology_pointers define is pointless and only cause
obfuscation now, remove it.

Tested on x86 machine, topology information in sys/devices/system/cpu/
cpuX/topology/ is the same after appling this patch set.

Signed-off-by: Hanjun Guo 
---
 drivers/base/topology.c |   20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 2f5919e..94ffee3 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -62,25 +62,6 @@ static ssize_t show_cpumap(int type, const struct cpumask 
*mask, char *buf)
 }
 #endif
 
-#ifdef arch_provides_topology_pointers
-#define define_siblings_show_map(name) \
-static ssize_t show_##name(struct device *dev, \
-  struct device_attribute *attr, char *buf)\
-{  \
-   unsigned int cpu = dev->id; \
-   return show_cpumap(0, topology_##name(cpu), buf);   \
-}
-
-#define define_siblings_show_list(name)
\
-static ssize_t show_##name##_list(struct device *dev,  \
- struct device_attribute *attr,\
- char *buf)\
-{  \
-   unsigned int cpu = dev->id; \
-   return show_cpumap(1, topology_##name(cpu), buf);   \
-}
-
-#else
 #define define_siblings_show_map(name) \
 static ssize_t show_##name(struct device *dev, \
   struct device_attribute *attr, char *buf)\
@@ -95,7 +76,6 @@ static ssize_t show_##name##_list(struct device *dev, 
\
 {  \
return show_cpumap(1, topology_##name(dev->id), buf);   \
 }
-#endif
 
 #define define_siblings_show_func(name)\
define_siblings_show_map(name); define_siblings_show_list(name)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation

2013-07-26 Thread Dave Chinner

On Fri, Jul 26, 2013 at 04:28:52PM -0700, Paul E. McKenney wrote:
> Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
> amazingly long NMI handlers from a trinity-induced workload involving
> lots of concurrent sync() calls (https://lkml.org/lkml/2013/7/23/369).
> There are any number of things that one might do to make sync() behave
> better under high levels of contention, but it is also the case that
> multiple concurrent sync() system calls can be satisfied by a single
> sys_sync() invocation.
> 
> Given that this situation is reminiscent of rcu_barrier(), this commit
> applies the rcu_barrier() approach to sys_sync().  This approach uses
> a global mutex and a sequence counter.  The mutex is held across the
> sync() operation, which eliminates contention between concurrent sync()
> operations.
>
> The counter is incremented at the beginning and end of
> each sync() operation, so that it is odd while a sync() operation is in
> progress and even otherwise, just like sequence locks.
> 
> The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
> now handles the concurrency.  The sys_sync() function first takes a
> snapshot of the counter, then acquires the mutex, and then takes another
> snapshot of the counter.  If the values of the two snapshots indicate that
> a full do_sync() executed during the mutex acquisition, the sys_sync()
> function releases the mutex and returns ("Our work is done!").  Otherwise,
> sys_sync() increments the counter, invokes do_sync(), and increments
> the counter again.
> 
> This approach allows a single call to do_sync() to satisfy an arbitrarily
> large number of sync() system calls, which should eliminate issues due
> to large numbers of concurrent invocations of the sync() system call.

This is not addressing the problem that is causing issues during
sync. Indeed, it only puts a bandaid over the currently observed
trigger.

Indeed, i suspect that this will significantly slow down concurrent
sync operations, as it serialised sync across all superblocks rather
than serialising per-superblock like is currently done. Indeed, that
per-superblock serialisation is where all the lock contention
problems are. And it's not sync alone that causes the contention
problems - it has to be combined with other concurrent workloads
that add or remove inodes from the inode cache at tha same time.

I have patches to address that by removing the source
of the lock contention completely, and not just for the sys_sync
trigger. Those patches make the problems with concurrent
sys_sync operation go away completely for me, not to mention improve
performance for 8+ thread metadata workloads on XFS significantly.

IOWs, I don't see that concurrent sys_sync operation is a problem at
all, and it is actively desirable for systems that have multiple
busy filesystems as it allows concurrent dispatch of IO across those
multiple filesystems. Serialising all sys_sync work might stop the
contention problems, but it will also slow down concurrent sync
operations on busy systems as it only allows one thread to dispatch
and wait for IO at a time.

So, let's not slap a bandaid over a symptom - let's address the
cause of the lock contention properly

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] fs/ocfs2: use list_for_each_entry() instead of list_for_each()

2013-07-26 Thread Dong Fang


Signed-off-by: Dong Fang 
---
 fs/ocfs2/cluster/heartbeat.c |   14 +-
 fs/ocfs2/dlm/dlmast.c|8 +++-
 fs/ocfs2/dlm/dlmcommon.h |4 +---
 fs/ocfs2/dlm/dlmconvert.c|   11 +++
 fs/ocfs2/dlm/dlmdebug.c  |   15 ---
 fs/ocfs2/dlm/dlmdomain.c |   20 +---
 fs/ocfs2/dlm/dlmlock.c   |9 ++---
 fs/ocfs2/dlm/dlmmaster.c |   17 -
 fs/ocfs2/dlm/dlmthread.c |   19 +--
 fs/ocfs2/dlm/dlmunlock.c |4 +---
 10 files changed, 33 insertions(+), 88 deletions(-)

diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 5c1c864..25b72e8 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -628,11 +628,9 @@ static void o2hb_fire_callbacks(struct o2hb_callback 
*hbcall,
struct o2nm_node *node,
int idx)
 {
-   struct list_head *iter;
struct o2hb_callback_func *f;
 
-   list_for_each(iter, >list) {
-   f = list_entry(iter, struct o2hb_callback_func, hc_item);
+   list_for_each_entry(f, >list, hc_item) {
mlog(ML_HEARTBEAT, "calling funcs %p\n", f);
(f->hc_func)(node, idx, f->hc_data);
}
@@ -2516,8 +2514,7 @@ unlock:
 int o2hb_register_callback(const char *region_uuid,
   struct o2hb_callback_func *hc)
 {
-   struct o2hb_callback_func *tmp;
-   struct list_head *iter;
+   struct o2hb_callback_func *f;
struct o2hb_callback *hbcall;
int ret;
 
@@ -2540,10 +2537,9 @@ int o2hb_register_callback(const char *region_uuid,
 
down_write(_callback_sem);
 
-   list_for_each(iter, >list) {
-   tmp = list_entry(iter, struct o2hb_callback_func, hc_item);
-   if (hc->hc_priority < tmp->hc_priority) {
-   list_add_tail(>hc_item, iter);
+   list_for_each_entry(f, >list, hc_item) {
+   if (hc->hc_priority < f->hc_priority) {
+   list_add_tail(>hc_item, >hc_item);
break;
}
}
diff --git a/fs/ocfs2/dlm/dlmast.c b/fs/ocfs2/dlm/dlmast.c
index fbec0be..b46278f 100644
--- a/fs/ocfs2/dlm/dlmast.c
+++ b/fs/ocfs2/dlm/dlmast.c
@@ -292,7 +292,7 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, 
void *data,
struct dlm_lock *lock = NULL;
struct dlm_proxy_ast *past = (struct dlm_proxy_ast *) msg->buf;
char *name;
-   struct list_head *iter, *head=NULL;
+   struct list_head *head = NULL;
__be64 cookie;
u32 flags;
u8 node;
@@ -373,8 +373,7 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, 
void *data,
/* try convert queue for both ast/bast */
head = >converting;
lock = NULL;
-   list_for_each(iter, head) {
-   lock = list_entry (iter, struct dlm_lock, list);
+   list_for_each_entry(lock, head, list) {
if (lock->ml.cookie == cookie)
goto do_ast;
}
@@ -385,8 +384,7 @@ int dlm_proxy_ast_handler(struct o2net_msg *msg, u32 len, 
void *data,
else
head = >granted;
 
-   list_for_each(iter, head) {
-   lock = list_entry (iter, struct dlm_lock, list);
+   list_for_each_entry(lock, head, list) {
if (lock->ml.cookie == cookie)
goto do_ast;
}
diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index de854cc..e051776 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -1079,11 +1079,9 @@ static inline int dlm_lock_compatible(int existing, int 
request)
 static inline int dlm_lock_on_list(struct list_head *head,
   struct dlm_lock *lock)
 {
-   struct list_head *iter;
struct dlm_lock *tmplock;
 
-   list_for_each(iter, head) {
-   tmplock = list_entry(iter, struct dlm_lock, list);
+   list_for_each_entry(tmplock, head, list) {
if (tmplock == lock)
return 1;
}
diff --git a/fs/ocfs2/dlm/dlmconvert.c b/fs/ocfs2/dlm/dlmconvert.c
index 29a886d..a2bda15 100644
--- a/fs/ocfs2/dlm/dlmconvert.c
+++ b/fs/ocfs2/dlm/dlmconvert.c
@@ -123,7 +123,6 @@ static enum dlm_status __dlmconvert_master(struct dlm_ctxt 
*dlm,
   int *kick_thread)
 {
enum dlm_status status = DLM_NORMAL;
-   struct list_head *iter;
struct dlm_lock *tmplock=NULL;
 
assert_spin_locked(>spinlock);
@@ -185,16 +184,14 @@ static enum dlm_status __dlmconvert_master(struct 
dlm_ctxt *dlm,
 
/* upconvert from here on */
status = DLM_NORMAL;
-   list_for_each(iter, >granted) {
-   tmplock = list_entry(iter, struct dlm_lock, list);
+   list_for_each_entry(tmplock, >granted, list) {
if (tmplock == lock)

Re: [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation

On Fri, Jul 26, 2013 at 05:29:44PM -0700, Linus Torvalds wrote:
> On Fri, Jul 26, 2013 at 4:28 PM, Paul E. McKenney
>  wrote:
> > +
> > +   snap = ACCESS_ONCE(sync_seq);
> > +   smp_mb();  /* Prevent above from bleeding into critical section. */
> > +   mutex_lock(_mutex);
> > +   snap_done = ACCESS_ONCE(sync_seq);
> > +   if (ULONG_CMP_GE(snap_done, ((snap + 1) & ~0x1) + 2)) {
> 
> Ugh. I dislike this RCU'ism. It's bad code. It doesn't just look ugly
> and complex, it's also not even clever.
> 
> It is possible that the compiler can fix up this horrible stuff and
> turn it into the nice clever stuff, but I dunno.
> 
> The two things that make me go "Eww":
> 
>  - "((snap + 1) & ~0x1) + 2" just isn't the smart way of doing things.
> Afaik, "(snap+3)&~1" gives the same answer with a simpler arithmetic.

You are right, this is a better approach, and I have changed this
patch to use it.

I will also apply it to the similar code in RCU.

>  - that ULONG_CMP_GE() macro is disgusting. What's wrong with doing it
> the sane way, which is how (for example) the time comparison functions
> do it (see time_before() and friends): Just do it
> 
>  ((long)(a-b) >= 0)
> 
>which doesn't need large constants.

True, and I used to use this approach, but it can result in signed integer
overflow, which is undefined in C.  (Yes, we use -fno-strict-overflow,
but there might come a day when we don't want to.)  And ULONG_CMP_GE()
generated exactly the same code as ((long)(a-b) >= 0) last I tried it.

> And yeah, a smart compiler will hopefully do one or both of those, but
> what annoys me about the source code is that it actually isn't even
> any more readable despite being more complicated and needing more
> compiler tricks for good code generation.
> 
> So that one line is (a) totally undocumented, (b) not obvious and (c)
> not very clever.

For (a), how about I add the following comment?

/*
 * If the value in snap is odd, we need to wait for the current
 * do_sync() to complete, then wait for the next one, in other
 * words, we need the value of snap_done to be three larger than
 * the value of snap.  On the other hand, if the value in snap is
 * even, we only have to wait for the next request to complete,
 * in other words, we need the value of snap_done to be only two
 * greater than the value of snap.  The "(snap + 3) & 0x1" computes
 * this for us.
 */

Hopefully, this helps with (b).  For (c), I now use the expression
you suggested above.

Does that help, or is more needed.

> I'm also not a huge believer in those two WARN_ON_ONCE's you have. The
> sequence count is *only* updated in this place, it is *only* updated
> inside a lock, and dammit, if those tests ever trigger, we have bigger
> problems than that piece of code. Those warnings may make sense in
> code when you write it the first time (because you're thinking things
> through), but they do *not* make sense at the point where that code is
> actually committed to the project. I notice that you have those
> warnings in the RCU code itself, and I don't really think they make
> sense there either.

I agree that the fact that this variable is updated only in this one
place in sys_sync() makes these warning less than useful in production.
I will therefore remove them once this patch get beyond RFC status.

However, the similar warnings in RCU have been very helpful in spotting
bugs in the callers of rcu_idle_enter() and friends, so I would very
much prefer to keep them.

> Finally, the ACCESS_ONCE() is also only correct in the one place where
> you do the access speculatively outside the lock. Inside the lock,
> there is no excuse/reason for them, since the value is stable, and you
> need the memory barriers anyway, so there's no way the compiler could
> migrate things regardless. So the other two ACCESS_ONCE calls are
> actually misleading and wrong, and only likely to make the compiler
> generate much worse code.
> 
> In fact, the ACCESS_ONCE() is pretty much *guaranteed* to cause the
> compiler to unnecessarily generate worse code, since there is
> absolutely no reason why the compiler couldn't reuse the "snap_done"
> value it reads when it then does the "sync_seq++". There's no way the
> value could possible have changed from the "snap_done" value earlier,
> since we're inside the lock, so why force the compiler to reload it?

Good point, I have removed the second ACCESS_ONCE().

> In short, I think the code does too much. I'm sure it works, but I
> think it might make people believe that the extra work (like those
> later ACCESS_ONCE ones) is meaningful, when it isn't. It's just
> make-believe, afaik.
> 
> But maybe I'm missing something, and there actually *is* reason for
> the extra work/complexity?

Only for the ULONG_CMP_GE().  On the rest, you are quite right, and
the updated patch is as follows.  

But I will believe it works only if it helps Dave

Re: PROBLEM: Persistent unfair sharing of a processor by auto groups in 3.11-rc2 (has twice regressed)

2013-07-26 Thread Paul Turner

On Fri, Jul 26, 2013 at 2:50 PM, Peter Zijlstra  wrote:
> On Fri, Jul 26, 2013 at 02:24:50PM -0700, Paul Turner wrote:
>> On Fri, Jul 26, 2013 at 2:03 PM, Peter Zijlstra  wrote:
>> >
>> >
>> > OK, so I have the below; however on a second look, Paul, shouldn't that
>> > update_cfs_shares() call be in entity_tick(), right after calling
>> > update_cfs_rq_blocked_load(). Because placing it in
>> > update_cfs_rq_blocked_load() means its now called twice on the
>> > enqueue/dequeue paths through:
>> >
>> >   {en,de}queue_entity()
>> > {en,de}queue_entity_load_avg()
>> >   update_cfs_rq_blocked_load()
>> > update_cfs_shares()
>>
>> Yes, I agree: placing it directly in entity_tick() would be better.
>
> OK, how about the below then?

Looks good.

>
>> [ In f269ae046 the calls to update_cfs_rq_blocked_load() were amortized
>> and the separate update in {en,de}queue_entity_load_avg() were
>> removed. ]
>
> Right, I remember/saw that. Did you ever figure out why that regressed;
> as in should we look to bring some of that back?

Yes, the savings are measurable (we actually still use it internally).

So the particular problem in Linus's workload was that the amortization meant
that there was more delay until the first update for a newly created task.  This
then had negative interactivity with a make -j  workload since it allowed the
tasks to be over-represented in terms of the group shares they received.

With:
  a75cdaa9: "sched: Set an initial value of runnable avg for new forked task"

This should now be improved so we should look at bringing it back.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] uio: provide vm access to UIO_MEM_PHYS maps

On Tue, Jul 16, 2013 at 07:21:03PM +0200, Uwe Kleine-König wrote:
> This makes it possible to let gdb access mappings of the process that is
> being debugged.
> 
> uio_mmap_logical was moved and uio_vm_ops renamed to group related code
> and differentiate to new stuff.
> 
> Signed-off-by: Uwe Kleine-König 

This patch breaks the build:

drivers/uio/uio.c: In function ‘uio_mmap_logical’:
drivers/uio/uio.c:627:17: error: ‘uio_vm_ops’ undeclared (first use in this 
function)

:(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/79] 3.10.4-stable review

2013-07-26 Thread Рустафа Джамурахметов

On Sat, Jul 27, 2013 at 12:19:24AM +, Shuah Khan wrote:
> On 07/26/2013 03:21 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.10.4 release.
> > There are 79 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun Jul 28 20:45:08 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.4-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> Patches applied cleanly to 3.0.87, 3.4.54 and 3.10.3
> 
> Compiled and booted on the following systems:
> 
> Samsung Series 9 900X4C Intel Corei5:
>  (3.4.55-rc1, 3.10.4-rc2)
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics:
>  (3.0.88-rc1, 3.4.55-rc1, and 3.10.4-rc1)
> 
> dmesgs for all releases look good. No regressions compared to the 
> previous dmesgs for each of these releases. dmesg emerg, crit, alert, 
> err are clean. No regressions in warn.

Great, thanks for testing and letting me know.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

2013-07-26 Thread Nicholas A. Bellinger

On Fri, 2013-07-26 at 14:14 -0700, Nicholas A. Bellinger wrote:
> On Thu, 2013-07-25 at 20:09 -0600, Jens Axboe wrote:
> > On Thu, Jul 25 2013, Nicholas A. Bellinger wrote:
> > > On Thu, 2013-07-25 at 12:16 +0200, Alexander Gordeev wrote:
> > > > On Mon, Jul 22, 2013 at 02:10:36PM -0700, Nicholas A. Bellinger wrote:
> > > > > Np.  FYI, you'll want to use the latest commit e7827b351 HEAD from
> > > > > target-pending/scsi-mq, which now has functioning scsi-generic 
> > > > > support.
> > > > 
> > > > Survives a boot, a kernel build and the build's result :)
> > > 
> > > Great.  Thanks for the feedback Alexander!
> > > 
> > > So the next step on my end is to enable -mq for ahci, and verify initial
> > > correctness using QEMU/KVM hardware emulation.
> > > 
> > > Btw, I've been looking at enabling the SHT->cmd_size for struct
> > > ata_queued_cmd descriptor pre-allocation, but AFAICT these descriptors
> > > are already all pre-allocated by libata and obtained via ata_qc_new() ->
> > > __ata_qc_from_tag() during ata_scsi_queuecmd().
> > 
> > Might still not be a bad idea to do it:
> > 
> > - Cleans up a driver, getting rid of the need to alloc, maintain, and
> >   free those structures.
> > 
> > - Should be some cache locality benefits to having it all sequential.
> > 
> 
> Looking at this some more, there are a number of locations outside of
> the main blk_mq_ops->queue_rq() -> SHT->queuecommand_mq() dispatch that
> use *ata_qc_from_tag() to obtain *ata_queued_cmd, and a few without a
> associated struct scsi_cmnd like libata-core.c:ata_exec_internal_sg()
> for example..
> 
> So I don't think (completely) getting rid of ata_port->qcmds[] will be
> possible, and just converting the ata_scsi_queuecmd() path to use the
> extra SHT->cmd_size pre-allocation for *ata_queued_cmd might end up
> being more trouble that it's worth.  Still undecided on that part..
> 
> Tejun, do you have any thoughts + input here..?
> 

OK, so I decided to give this a shot anyways..  Here is a quick
conversion for libata + AHCI to use blk-mq -> scsi-mq pre-allocation for
ata_queued_cmd descriptors:

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 2b50dfd..61b3db8 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -92,6 +92,9 @@ static int ahci_pci_device_resume(struct pci_dev *pdev);
 
 static struct scsi_host_template ahci_sht = {
AHCI_SHT("ahci"),
+   .scsi_mq = true,
+   .cmd_size = sizeof(struct ata_queued_cmd),
+   .queuecommand_mq = ata_scsi_queuecmd,
 };
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index f218427..e21814d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4725,29 +4725,25 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words)
 /**
  * ata_qc_new - Request an available ATA command, for queueing
  * @ap: target port
+ * @sc: incoming scsi_cmnd descriptor
  *
  * LOCKING:
  * None.
  */
 
-static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap)
+static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap,
+struct scsi_cmnd *sc)
 {
struct ata_queued_cmd *qc = NULL;
-   unsigned int i;
+   struct request *rq = sc->request;
 
/* no command while frozen */
if (unlikely(ap->pflags & ATA_PFLAG_FROZEN))
return NULL;
 
-   /* the last tag is reserved for internal command. */
-   for (i = 0; i < ATA_MAX_QUEUE - 1; i++)
-   if (!test_and_set_bit(i, >qc_allocated)) {
-   qc = __ata_qc_from_tag(ap, i);
-   break;
-   }
-
-   if (qc)
-   qc->tag = i;
+   qc = (struct ata_queued_cmd *)sc->SCp.ptr;
+   qc->scsicmd = sc;
+   qc->tag = rq->tag;
 
return qc;
 }
@@ -4755,19 +4751,20 @@ static struct ata_queued_cmd *ata_qc_new(struct 
ata_port *ap)
 /**
  * ata_qc_new_init - Request an available ATA command, and initialize it
  * @dev: Device from whom we request an available command structure
+ * @sc: incoming scsi_cmnd descriptor
  *
  * LOCKING:
  * None.
  */
 
-struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev)
+struct ata_queued_cmd *ata_qc_new_init(struct ata_device *dev,
+  struct scsi_cmnd *sc)
 {
struct ata_port *ap = dev->link->ap;
struct ata_queued_cmd *qc;
 
-   qc = ata_qc_new(ap);
+   qc = ata_qc_new(ap, sc);
if (qc) {
-   qc->scsicmd = NULL;
qc->ap = ap;
qc->dev = dev;
 
@@ -4797,10 +4794,9 @@ void ata_qc_free(struct ata_queued_cmd *qc)
 
qc->flags = 0;
tag = qc->tag;
-   if (likely(ata_tag_valid(tag))) {
+
+   if (likely(ata_tag_valid(tag)))
qc->tag = ATA_TAG_POISON;
-   clear_bit(tag, >qc_allocated);
-   }
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0101af5..e5ab880 100644
---

Re: [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation

2013-07-26 Thread Linus Torvalds

On Fri, Jul 26, 2013 at 4:28 PM, Paul E. McKenney
 wrote:
> +
> +   snap = ACCESS_ONCE(sync_seq);
> +   smp_mb();  /* Prevent above from bleeding into critical section. */
> +   mutex_lock(_mutex);
> +   snap_done = ACCESS_ONCE(sync_seq);
> +   if (ULONG_CMP_GE(snap_done, ((snap + 1) & ~0x1) + 2)) {

Ugh. I dislike this RCU'ism. It's bad code. It doesn't just look ugly
and complex, it's also not even clever.

It is possible that the compiler can fix up this horrible stuff and
turn it into the nice clever stuff, but I dunno.

The two things that make me go "Eww":

 - "((snap + 1) & ~0x1) + 2" just isn't the smart way of doing things.
Afaik, "(snap+3)&~1" gives the same answer with a simpler arithmetic.

 - that ULONG_CMP_GE() macro is disgusting. What's wrong with doing it
the sane way, which is how (for example) the time comparison functions
do it (see time_before() and friends): Just do it

 ((long)(a-b) >= 0)

   which doesn't need large constants.

And yeah, a smart compiler will hopefully do one or both of those, but
what annoys me about the source code is that it actually isn't even
any more readable despite being more complicated and needing more
compiler tricks for good code generation.

So that one line is (a) totally undocumented, (b) not obvious and (c)
not very clever.

I'm also not a huge believer in those two WARN_ON_ONCE's you have. The
sequence count is *only* updated in this place, it is *only* updated
inside a lock, and dammit, if those tests ever trigger, we have bigger
problems than that piece of code. Those warnings may make sense in
code when you write it the first time (because you're thinking things
through), but they do *not* make sense at the point where that code is
actually committed to the project. I notice that you have those
warnings in the RCU code itself, and I don't really think they make
sense there either.

Finally, the ACCESS_ONCE() is also only correct in the one place where
you do the access speculatively outside the lock. Inside the lock,
there is no excuse/reason for them, since the value is stable, and you
need the memory barriers anyway, so there's no way the compiler could
migrate things regardless. So the other two ACCESS_ONCE calls are
actually misleading and wrong, and only likely to make the compiler
generate much worse code.

In fact, the ACCESS_ONCE() is pretty much *guaranteed* to cause the
compiler to unnecessarily generate worse code, since there is
absolutely no reason why the compiler couldn't reuse the "snap_done"
value it reads when it then does the "sync_seq++". There's no way the
value could possible have changed from the "snap_done" value earlier,
since we're inside the lock, so why force the compiler to reload it?

In short, I think the code does too much. I'm sure it works, but I
think it might make people believe that the extra work (like those
later ACCESS_ONCE ones) is meaningful, when it isn't. It's just
make-believe, afaik.

But maybe I'm missing something, and there actually *is* reason for
the extra work/complexity?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFC: revert request for cpuidle patches e11538d1 and 69a37bea

2013-07-26 Thread Rafael J. Wysocki

On Friday, July 26, 2013 11:48:36 PM Rafael J. Wysocki wrote:
> On Friday, July 26, 2013 02:29:40 PM Rik van Riel wrote:
> > On 07/26/2013 02:27 PM, Arjan van de Ven wrote:
> > > On 7/26/2013 11:13 AM, Rik van Riel wrote:
> > >
> > >>
> > >> Could you try running the tests with just the repeat mode
> > >> stuff from commit 69a37bea excluded, but leaving the common
> > >> infrastructure and commit e11538?
> > >>
> > >
> > > personally I think we should go the other way around.
> > > revert the set entirely first, and now, and get our performance back
> > > to what it should be
> > >
> > > and then see what we can add back without causing the regressions.
> > > this may take longer, or be done in steps, and that's ok.
> > >
> > > the end point may well be the same... but we can then evaluate in the 
> > > right
> > > direction.
> > 
> > Works for me. I have no objection to reverting both patches,
> > if the people planning to fix the code prefer that :)
> 
> OK, I'll queue up the reverts as fixes for 3.11-rc4.

So, the reverts are on the fixes-next branch of the linux-pm.git tree that you
can access at

http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/?h=fixes-next

However, they are not simple reverts as we've had some non-trivial changes on
top of those commits already, so I'd appreciate it a lot if somebody could
double check if I didn't break anything in them.

They are based on top of my master branch for now, but I'll rebase them on
3.11-rc3 when it's out.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 00/79] 3.10.4-stable review

2013-07-26 Thread Shuah Khan

On 07/26/2013 03:21 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.10.4 release.
> There are 79 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Jul 28 20:45:08 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.4-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Patches applied cleanly to 3.0.87, 3.4.54 and 3.10.3

Compiled and booted on the following systems:

Samsung Series 9 900X4C Intel Corei5:
 (3.4.55-rc1, 3.10.4-rc2)
HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics:
 (3.0.88-rc1, 3.4.55-rc1, and 3.10.4-rc1)

dmesgs for all releases look good. No regressions compared to the 
previous dmesgs for each of these releases. dmesg emerg, crit, alert, 
err are clean. No regressions in warn.

Cross-compile testing:
HP Compaq dc7700 SFF desktop: x86-64 Intel Core-i2:
 (3.0.88-rc1, 3.4.55-rc1, and 3.10.4-rc1)

Cross-compile tests results:

alpha: defconfig passed on all
arm: defconfig passed on all
arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.10.y
c6x: not applicable to 3.0.y, defconfig passed on 3.4.y, and 3.10.y
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig passed on all
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah

Shuah Khan, Linux Kernel Developer - Open Source Group Samsung Research 
America (Silicon Valley) shuah...@samsung.com | (970) 672-0658
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: replace strict_strtol() with kstrtol()



27.07.2013, 03:16, "GregKH" :
> On Tue, Jul 23, 2013 at 12:00:44AM +0400, Evgeniy Polyakov wrote:
>
>>  Hi everyone
>>
>>  19.07.2013, 11:16, "Jingoo Han" :
>>>  The usage of strict_strtol() is not preferred, because
>>>  strict_strtol() is obsolete. Thus, kstrtol() should be
>>>  used.
>>  Looks good to me, although I do not really see the difference
>>  Greg, please pull into your tree or suggest appropriate.
>>
>>  Acked-by: Evgeniy Polyakov 
>
> Can someone resend this, I don't seem to be able to find it...

Looks like Andrew Morton picked it up today
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] input: ti_tsc: Enable shared IRQ for TSC

2013-07-26 Thread Zubair Lutfullah

From: "Patil, Rachna" 

Touchscreen and ADC share the same IRQ line from parent MFD core.
Previously only Touchscreen was interrupt based.
With continuous mode support added in ADC driver, driver requires
interrupt to process the ADC samples, so enable shared IRQ flag bit for
touchscreen.

Signed-off-by: Patil, Rachna 
Acked-by: Vaibhav Hiremath 
Signed-off-by: Zubair Lutfullah 
Acked-by: Greg Kroah-Hartman 
---
 drivers/input/touchscreen/ti_am335x_tsc.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index e1c5300..68d1250 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -260,8 +260,18 @@ static irqreturn_t titsc_irq(int irq, void *dev)
unsigned int fsm;
 
status = titsc_readl(ts_dev, REG_IRQSTATUS);
-   if (status & IRQENB_FIFO0THRES) {
-
+   /*
+* ADC and touchscreen share the IRQ line.
+* FIFO1 threshold, FIFO1 Overrun and FIFO1 underflow
+* interrupts are used by ADC,
+* hence return from touchscreen IRQ handler if FIFO1
+* related interrupts occurred.
+*/
+   if ((status & IRQENB_FIFO1THRES) ||
+   (status & IRQENB_FIFO1OVRRUN) ||
+   (status & IRQENB_FIFO1UNDRFLW))
+   return IRQ_NONE;
+   else if (status & IRQENB_FIFO0THRES) {
titsc_read_coordinates(ts_dev, , , , );
 
if (ts_dev->pen_down && z1 != 0 && z2 != 0) {
@@ -315,7 +325,7 @@ static irqreturn_t titsc_irq(int irq, void *dev)
}
 
if (irqclr) {
-   titsc_writel(ts_dev, REG_IRQSTATUS, irqclr);
+   titsc_writel(ts_dev, REG_IRQSTATUS, (status | irqclr));
am335x_tsc_se_update(ts_dev->mfd_tscadc);
return IRQ_HANDLED;
}
@@ -389,7 +399,7 @@ static int titsc_probe(struct platform_device *pdev)
}
 
err = request_irq(ts_dev->irq, titsc_irq,
- 0, pdev->dev.driver->name, ts_dev);
+ IRQF_SHARED, pdev->dev.driver->name, ts_dev);
if (err) {
dev_err(>dev, "failed to allocate irq.\n");
goto err_free_mem;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] iio: ti_am335x_adc: Add continuous sampling and trigger support

2013-07-26 Thread Zubair Lutfullah

Previously the driver had only one-shot reading functionality.
This patch adds triggered buffer support to the driver.
A buffer of samples can now be read via /dev/iio.

Patil Rachna (TI) laid the ground work for ADC HW register access.
Russ Dill (TI) fixed bugs in the driver relevant to FIFOs and IRQs.

I fixed channel scanning so multiple ADC channels can be read
simultaneously and pushed to userspace.
Restructured the driver to fit IIO ABI.
And added trigger support.

Signed-off-by: Zubair Lutfullah 
Acked-by: Greg Kroah-Hartman 
Signed-off-by: Russ Dill 
---
 drivers/iio/adc/ti_am335x_adc.c  |  334 +++---
 include/linux/mfd/ti_am335x_tscadc.h |   13 +-
 2 files changed, 285 insertions(+), 62 deletions(-)

diff --git a/drivers/iio/adc/ti_am335x_adc.c b/drivers/iio/adc/ti_am335x_adc.c
index 3ceac3e..630ce85 100644
--- a/drivers/iio/adc/ti_am335x_adc.c
+++ b/drivers/iio/adc/ti_am335x_adc.c
@@ -26,14 +26,25 @@
 #include 
 #include 
 #include 
-
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 struct tiadc_device {
struct ti_tscadc_dev *mfd_tscadc;
int channels;
u8 channel_line[8];
u8 channel_step[8];
+   struct work_struct poll_work;
+   wait_queue_head_t wq_data_avail;
+   bool data_avail;
+   u32 *inputbuffer;
+   int sample_count;
+   int irq;
 };
 
 static unsigned int tiadc_readl(struct tiadc_device *adc, unsigned int reg)
@@ -56,27 +67,28 @@ static u32 get_adc_step_mask(struct tiadc_device *adc_dev)
return step_en;
 }
 
-static void tiadc_step_config(struct tiadc_device *adc_dev)
+static void tiadc_step_config(struct iio_dev *indio_dev)
 {
+   struct tiadc_device *adc_dev = iio_priv(indio_dev);
unsigned int stepconfig;
-   int i, steps;
+   int i, steps, chan;
 
/*
 * There are 16 configurable steps and 8 analog input
 * lines available which are shared between Touchscreen and ADC.
-*
 * Steps backwards i.e. from 16 towards 0 are used by ADC
 * depending on number of input lines needed.
 * Channel would represent which analog input
 * needs to be given to ADC to digitalize data.
 */
-
steps = TOTAL_STEPS - adc_dev->channels;
-   stepconfig = STEPCONFIG_AVG_16 | STEPCONFIG_FIFO1;
+   if (iio_buffer_enabled(indio_dev))
+   stepconfig = STEPCONFIG_AVG_16 | STEPCONFIG_FIFO1
+   | STEPCONFIG_MODE_SWCNT;
+   else
+   stepconfig = STEPCONFIG_AVG_16 | STEPCONFIG_FIFO1;
 
for (i = 0; i < adc_dev->channels; i++) {
-   int chan;
-
chan = adc_dev->channel_line[i];
tiadc_writel(adc_dev, REG_STEPCONFIG(steps),
stepconfig | STEPCONFIG_INP(chan));
@@ -85,7 +97,190 @@ static void tiadc_step_config(struct tiadc_device *adc_dev)
adc_dev->channel_step[i] = steps;
steps++;
}
+}
+
+static irqreturn_t tiadc_irq(int irq, void *private)
+{
+   struct iio_dev *idev = private;
+   struct tiadc_device *adc_dev = iio_priv(idev);
+   unsigned int status, config;
+   status = tiadc_readl(adc_dev, REG_IRQSTATUS);
+
+   /* FIFO Overrun. Clear flag. Disable/Enable ADC to recover */
+   if (status & IRQENB_FIFO1OVRRUN) {
+   config = tiadc_readl(adc_dev, REG_CTRL);
+   config &= ~(CNTRLREG_TSCSSENB);
+   tiadc_writel(adc_dev, REG_CTRL, config);
+   tiadc_writel(adc_dev, REG_IRQSTATUS, IRQENB_FIFO1OVRRUN |
+   IRQENB_FIFO1UNDRFLW | IRQENB_FIFO1THRES);
+   tiadc_writel(adc_dev, REG_CTRL, (config | CNTRLREG_TSCSSENB));
+   return IRQ_HANDLED;
+   } else if (status & IRQENB_FIFO1THRES) {
+   /* Wake adc_work that pushes FIFO data to iio buffer */
+   tiadc_writel(adc_dev, REG_IRQCLR, IRQENB_FIFO1THRES);
+   adc_dev->data_avail = 1;
+   wake_up_interruptible(_dev->wq_data_avail);
+   return IRQ_HANDLED;
+   } else
+   return IRQ_NONE;
+}
+
+static irqreturn_t tiadc_trigger_h(int irq, void *p)
+{
+   struct iio_poll_func *pf = p;
+   struct iio_dev *indio_dev = pf->indio_dev;
+   struct tiadc_device *adc_dev = iio_priv(indio_dev);
+   unsigned int config;
+
+   schedule_work(_dev->poll_work);
+   config = tiadc_readl(adc_dev, REG_CTRL);
+   tiadc_writel(adc_dev, REG_CTRL, config & ~CNTRLREG_TSCSSENB);
+   tiadc_writel(adc_dev, REG_CTRL, config |  CNTRLREG_TSCSSENB);
+
+   tiadc_writel(adc_dev,  REG_IRQSTATUS, IRQENB_FIFO1THRES |
+IRQENB_FIFO1OVRRUN | IRQENB_FIFO1UNDRFLW);
+   tiadc_writel(adc_dev,  REG_IRQENABLE, IRQENB_FIFO1THRES
+   | IRQENB_FIFO1OVRRUN);
+
+   iio_trigger_notify_done(indio_dev->trig);
+   return

[PATCH 0/2] iio: input: ti_am335x_adc: Add continuous sampling and trigger support round 3

2013-07-26 Thread Zubair Lutfullah

ADC and TSC share an IRQ line. Patch one is simple and adds shared irq support 
on the TSC side.

The second patch adds continuous sampling support to the am335x_adc driver.

It has been submitted previously. This is round 3.

Previously:
- Submitted as a series of patches and bug fixes.
- The driver would continuously push samples into a buffer exposed to userspace.
- Extra sysfs attributes for selecting continuous mode or one shot mode.
- No trigger functionality.
- Reading from /dev/iio required patching the provided generic_buffer.c iio 
test application to bypass trigger.
- Only one channel could be read at a time.
- And even then, samples were skipped as the FIFO was read incorrectly.

Now: 
- All bug fixes merged together in one patch.
- Stuck closely to the IIO ABI this time.
- Added trigger support.
- Fixed channel scanning where only one channel could be read into the buffer 
at a time.
- Now all enabled channels in the scan_elements folder are pushed to the 
userspace properly without skipping any samples.
- generic_buffer.c test application can read samples without any modification.
- A sysfs trigger starts acquisition.

This has been tested on the Beaglebone Black running the am335x processor.
The patches apply on the iio branch fixes-togreg.

Patil, Rachna (1):
  input: ti_tsc: Enable shared IRQ for TSC

Zubair Lutfullah (1):
  iio: ti_am335x_adc: Add continuous sampling and trigger support

 drivers/iio/adc/ti_am335x_adc.c   |  334 +++--
 drivers/input/touchscreen/ti_am335x_tsc.c |   18 +-
 include/linux/mfd/ti_am335x_tscadc.h  |   13 +-
 3 files changed, 299 insertions(+), 66 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] arm64 fixes for 3.11

2013-07-26 Thread Benjamin Herrenschmidt

On Fri, 2013-07-26 at 12:54 -0700, Linus Torvalds wrote:
> 
> *Some* other 64-bit architectures do 16k stack sizes. But neither
> x86-64 nor powerpc do, afaik. Instead, they do irq stacks, which is
> generally a better idea than having one big stack.

Sadly you over estimated us here :-) We do 16K *and* irq stacks on
64-bit ... Remember our ABI with it's 112 bytes minimum per frame ?

It's been a while since I've last audited our actual usage mind you, we
*might* be able to reduce it but at this stage, since our typical
configs use a 64K base page size, it's not a big deal (ie, it's not an
order-N allocation).

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/21] nohz: Only enable context tracking on full dynticks CPUs

The context tracking subsystem has the ability to selectively
enable the tracking on any defined subset of CPU. This means that
we can define a CPU range that doesn't run the context tracking
and another range that does.

Now what we want in practice is to enable the tracking on full
dynticks CPUs only. In order to perform this, we just need to pass
our full dynticks CPU range selection from the full dynticks
subsystem to the context tracking.

This way we can spare the overhead of RCU user extended quiescent
state and vtime maintainance on the CPUs that are outside the
full dynticks range. Just keep in mind the raw context tracking
itself is still necessary everywhere.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |2 ++
 kernel/context_tracking.c|5 +
 kernel/time/tick-sched.c |4 
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index d883ff0..1ae37c7 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -34,6 +34,8 @@ static inline bool context_tracking_active(void)
return __this_cpu_read(context_tracking.active);
 }
 
+extern void context_tracking_cpu_set(int cpu);
+
 extern void user_enter(void);
 extern void user_exit(void);
 
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 7b095de..72bcb25 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -26,6 +26,11 @@ DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
 #endif
 };
 
+void context_tracking_cpu_set(int cpu)
+{
+   per_cpu(context_tracking.active, cpu) = true;
+}
+
 /**
  * user_enter - Inform the context tracking that the CPU is going to
  *  enter userspace mode.
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e80183f..6d604fd 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -350,6 +351,9 @@ void __init tick_nohz_init(void)
return;
}
 
+   for_each_cpu(cpu, nohz_full_mask)
+   context_tracking_cpu_set(cpu);
+
cpu_notifier(tick_nohz_cpu_down_callback, 0);
cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/21] context_tracing: Fix guest accounting with native vtime

1) If context tracking is enabled with native vtime accounting (which
combo is useless except for dev testing), we call vtime_guest_enter()
and vtime_guest_exit() on host <-> guest switches. But those are stubs
in this configurations. As a result, cputime is not correctly flushed
on kvm context switches.

2) If context tracking runs but is disabled on some CPUs, those
CPUs end up calling __guest_enter/__guest_exit which in turn
call vtime_account_system(). We don't want to call this because we
run in tick based accounting for these CPUs.

Refactor the guest_enter/guest_exit code such that all combinations
finally work.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |   52 --
 kernel/context_tracking.c|6 +++-
 2 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index fc09d7b..5984f25 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -20,25 +20,6 @@ struct context_tracking {
} state;
 };
 
-static inline void __guest_enter(void)
-{
-   /*
-* This is running in ioctl context so we can avoid
-* the call to vtime_account() with its unnecessary idle check.
-*/
-   vtime_account_system(current);
-   current->flags |= PF_VCPU;
-}
-
-static inline void __guest_exit(void)
-{
-   /*
-* This is running in ioctl context so we can avoid
-* the call to vtime_account() with its unnecessary idle check.
-*/
-   vtime_account_system(current);
-   current->flags &= ~PF_VCPU;
-}
 
 #ifdef CONFIG_CONTEXT_TRACKING
 DECLARE_PER_CPU(struct context_tracking, context_tracking);
@@ -56,9 +37,6 @@ static inline bool context_tracking_active(void)
 extern void user_enter(void);
 extern void user_exit(void);
 
-extern void guest_enter(void);
-extern void guest_exit(void);
-
 static inline enum ctx_state exception_enter(void)
 {
enum ctx_state prev_ctx;
@@ -81,21 +59,35 @@ extern void context_tracking_task_switch(struct task_struct 
*prev,
 static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
+static inline enum ctx_state exception_enter(void) { return 0; }
+static inline void exception_exit(enum ctx_state prev_ctx) { }
+static inline void context_tracking_task_switch(struct task_struct *prev,
+   struct task_struct *next) { }
+#endif /* !CONFIG_CONTEXT_TRACKING */
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern void guest_enter(void);
+extern void guest_exit(void);
+#else
 static inline void guest_enter(void)
 {
-   __guest_enter();
+   /*
+* This is running in ioctl context so we can avoid
+* the call to vtime_account() with its unnecessary idle check.
+*/
+   vtime_account_system(current);
+   current->flags |= PF_VCPU;
 }
 
 static inline void guest_exit(void)
 {
-   __guest_exit();
+   /*
+* This is running in ioctl context so we can avoid
+* the call to vtime_account() with its unnecessary idle check.
+*/
+   vtime_account_system(current);
+   current->flags &= ~PF_VCPU;
 }
-
-static inline enum ctx_state exception_enter(void) { return 0; }
-static inline void exception_exit(enum ctx_state prev_ctx) { }
-static inline void context_tracking_task_switch(struct task_struct *prev,
-   struct task_struct *next) { }
-#endif /* !CONFIG_CONTEXT_TRACKING */
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
 
 #endif
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 942835c..1f47119 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -141,12 +141,13 @@ void user_exit(void)
local_irq_restore(flags);
 }
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 void guest_enter(void)
 {
if (vtime_accounting_enabled())
vtime_guest_enter(current);
else
-   __guest_enter();
+   current->flags |= PF_VCPU;
 }
 EXPORT_SYMBOL_GPL(guest_enter);
 
@@ -155,9 +156,10 @@ void guest_exit(void)
if (vtime_accounting_enabled())
vtime_guest_exit(current);
else
-   __guest_exit();
+   current->flags &= ~PF_VCPU;
 }
 EXPORT_SYMBOL_GPL(guest_exit);
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
 
 
 /**
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/21] context_tracking: Fix runtime CPU off-case

As long as the context tracking is enabled on any CPU, even
a single one, all other CPUs need to keep track of their
user <-> kernel boundaries cross as well.

This is because a task can sleep while servicing an exception
that happened in the kernel or in userspace. Then when the task
eventually wakes up and return from the exception, the CPU needs
to know if we resume in userspace or in the kernel. exception_exit()
get this information from exception_enter() that saved the previous
state.

If the CPU where the exception happened didn't keep track of
these informations, exception_exit() doesn't know which state
tracking to restore on the CPU where the task got migrated
and we may return to userspace with the context tracking
subsystem thinking that we are in kernel mode.

This can be fixed in the long term if we move our context tracking
probes on very low level arch fast path user <-> kernel boundary,
although even that is worrisome as an exception can still happen
in the few instructions between the probe and the actual iret.

Also we are not yet ready to set these probes in the fast path given
the potential overhead problem it induces.

So let's fix this by always enable context tracking even on CPUs
that are not in the full dynticks range. OTOH we can spare the
rcu_user_*() and vtime_user_*() calls there because the tick runs
on these CPUs and we can handle RCU state machine and cputime
accounting through it.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 kernel/context_tracking.c |   52 
 1 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 1f47119..7b095de 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -54,17 +54,31 @@ void user_enter(void)
WARN_ON_ONCE(!current->mm);
 
local_irq_save(flags);
-   if (__this_cpu_read(context_tracking.active) &&
-   __this_cpu_read(context_tracking.state) != IN_USER) {
+   if ( __this_cpu_read(context_tracking.state) != IN_USER) {
+   if (__this_cpu_read(context_tracking.active)) {
+   /*
+* At this stage, only low level arch entry code 
remains and
+* then we'll run in userspace. We can assume there 
won't be
+* any RCU read-side critical section until the next 
call to
+* user_exit() or rcu_irq_enter(). Let's remove RCU's 
dependency
+* on the tick.
+*/
+   vtime_user_enter(current);
+   rcu_user_enter();
+   }
/*
-* At this stage, only low level arch entry code remains and
-* then we'll run in userspace. We can assume there won't be
-* any RCU read-side critical section until the next call to
-* user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
-* on the tick.
+* Even if context tracking is disabled on this CPU, because 
it's outside
+* the full dynticks mask for example, we still have to keep 
track of the
+* context transitions and states to prevent inconsistency on 
those of
+* other CPUs.
+* If a task triggers an exception in userspace, sleep on the 
exception
+* handler and then migrate to another CPU, that new CPU must 
know where
+* the exception returns by the time we call exception_exit().
+* This information can only be provided by the previous CPU 
when it called
+* exception_enter().
+* OTOH we can spare the calls to vtime and RCU when 
context_tracking.active
+* is false because we know that CPU is not tickless.
 */
-   vtime_user_enter(current);
-   rcu_user_enter();
__this_cpu_write(context_tracking.state, IN_USER);
}
local_irq_restore(flags);
@@ -130,12 +144,14 @@ void user_exit(void)
 
local_irq_save(flags);
if (__this_cpu_read(context_tracking.state) == IN_USER) {
-   /*
-* We are going to run code that may use RCU. Inform
-* RCU core about that (ie: we may need the tick again).
-*/
-   rcu_user_exit();
-   vtime_user_exit(current);
+   if (__this_cpu_read(context_tracking.active)) {
+   /*
+* We are going to run code that may use RCU. Inform
+* RCU core about that (ie: we may need the tick again).
+*/
+

[PATCH 01/21] sched: Consolidate open coded preemptible() checks

preempt_schedule() and preempt_schedule_context() open
code their preemptability checks.

Use the standard API instead for consolidation.

Signed-off-by: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Li Zhong 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Alex Shi 
Cc: Paul Turner 
Cc: Mike Galbraith 
Cc: Vincent Guittot 
---
 kernel/context_tracking.c |3 +--
 kernel/sched/core.c   |4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 383f823..942835c 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -87,10 +87,9 @@ void user_enter(void)
  */
 void __sched notrace preempt_schedule_context(void)
 {
-   struct thread_info *ti = current_thread_info();
enum ctx_state prev_ctx;
 
-   if (likely(ti->preempt_count || irqs_disabled()))
+   if (likely(!preemptible()))
return;
 
/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7c32cb..3fb7ace 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2510,13 +2510,11 @@ void __sched schedule_preempt_disabled(void)
  */
 asmlinkage void __sched notrace preempt_schedule(void)
 {
-   struct thread_info *ti = current_thread_info();
-
/*
 * If there is a non-zero preempt_count or interrupts are disabled,
 * we do not want to preempt the current task. Just return..
 */
-   if (likely(ti->preempt_count || irqs_disabled()))
+   if (likely(!preemptible()))
return;
 
do {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/21] context_tracking: Optimize main APIs off case with static key

Optimize user and exception entry/exit APIs with static
keys. This minimize the overhead for those who enable
CONFIG_NO_HZ_FULL without always using it. Having no range
passed to nohz_full= should result in the probes to be nopped
(at least we hope so...).

If this proves not be enough in the long term, we'll need
to bring an exception slow path by re-routing the exception
handlers.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |   27 ++-
 kernel/context_tracking.c|   12 ++--
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index f9356eb..e5ec0c9 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -38,23 +38,40 @@ static inline bool context_tracking_active(void)
 
 extern void context_tracking_cpu_set(int cpu);
 
-extern void user_enter(void);
-extern void user_exit(void);
+extern void context_tracking_user_enter(void);
+extern void context_tracking_user_exit(void);
+
+static inline void user_enter(void)
+{
+   if (static_key_false(_tracking_enabled))
+   context_tracking_user_enter();
+
+}
+static inline void user_exit(void)
+{
+   if (static_key_false(_tracking_enabled))
+   context_tracking_user_exit();
+}
 
 static inline enum ctx_state exception_enter(void)
 {
enum ctx_state prev_ctx;
 
+   if (!static_key_false(_tracking_enabled))
+   return 0;
+
prev_ctx = this_cpu_read(context_tracking.state);
-   user_exit();
+   context_tracking_user_exit();
 
return prev_ctx;
 }
 
 static inline void exception_exit(enum ctx_state prev_ctx)
 {
-   if (prev_ctx == IN_USER)
-   user_enter();
+   if (static_key_false(_tracking_enabled)) {
+   if (prev_ctx == IN_USER)
+   context_tracking_user_enter();
+   }
 }
 
 extern void context_tracking_task_switch(struct task_struct *prev,
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index f07505c..657f668 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -33,15 +33,15 @@ void context_tracking_cpu_set(int cpu)
 }
 
 /**
- * user_enter - Inform the context tracking that the CPU is going to
- *  enter userspace mode.
+ * context_tracking_user_enter - Inform the context tracking that the CPU is 
going to
+ *   enter userspace mode.
  *
  * This function must be called right before we switch from the kernel
  * to userspace, when it's guaranteed the remaining kernel instructions
  * to execute won't use any RCU read side critical section because this
  * function sets RCU in extended quiescent state.
  */
-void user_enter(void)
+void context_tracking_user_enter(void)
 {
unsigned long flags;
 
@@ -131,8 +131,8 @@ EXPORT_SYMBOL_GPL(preempt_schedule_context);
 #endif /* CONFIG_PREEMPT */
 
 /**
- * user_exit - Inform the context tracking that the CPU is
- * exiting userspace mode and entering the kernel.
+ * context_tracking_user_exit - Inform the context tracking that the CPU is
+ *  exiting userspace mode and entering the kernel.
  *
  * This function must be called after we entered the kernel from userspace
  * before any use of RCU read side critical section. This potentially include
@@ -141,7 +141,7 @@ EXPORT_SYMBOL_GPL(preempt_schedule_context);
  * This call supports re-entrancy. This way it can be called from any exception
  * handler without needing to know if we came from userspace or not.
  */
-void user_exit(void)
+void context_tracking_user_exit(void)
 {
unsigned long flags;
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/21] context_tracking: Remove full dynticks' hacky dependency on wide context tracking

Now that the full dynticks subsystem only enables the context tracking
on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE

This dependency was a hack to enable the context tracking widely for the
full dynticks susbsystem until the latter becomes able to enable it in a
more CPU-finegrained fashion.

Now CONTEXT_TRACKING_FORCE only stands for testing on archs that
work on support for the context tracking while full dynticks can't be
used yet due to unmet dependencies. It simulates a system where all CPUs
are full dynticks so that RCU user extended quiescent states and dynticks
cputime accounting can be tested on the given arch.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 init/Kconfig|   28 ++--
 kernel/time/Kconfig |1 -
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 247084b..ffbf5d7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -527,13 +527,29 @@ config RCU_USER_QS
 config CONTEXT_TRACKING_FORCE
bool "Force context tracking"
depends on CONTEXT_TRACKING
-   default CONTEXT_TRACKING
+   default y if !NO_HZ_FULL
help
- Probe on user/kernel boundaries by default in order to
- test the features that rely on it such as userspace RCU extended
- quiescent states.
- This test is there for debugging until we have a real user like the
- full dynticks mode.
+ The major pre-requirement for full dynticks to work is to
+ support the context tracking subsystem. But there are also
+ other dependencies to provide in order to make the full
+ dynticks working.
+
+ This option stands for testing when an arch implements the
+ context tracking backend but doesn't yet fullfill all the
+ requirements to make the full dynticks feature working.
+ Without the full dynticks, there is no way to test the support
+ for context tracking and the subsystems that rely on it: RCU
+ userspace extended quiescent state and tickless cputime
+ accounting. This option copes with the absence of the full
+ dynticks subsystem by forcing the context tracking on all
+ CPUs in the system.
+
+ Say Y only if you're working on the developpement of an
+ architecture backend for the context tracking.
+
+ Say N otherwise, this option brings an overhead that you
+ don't want in production.
+
 
 config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 70f27e8..747bbc7 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -105,7 +105,6 @@ config NO_HZ_FULL
select RCU_USER_QS
select RCU_NOCB_CPU
select VIRT_CPU_ACCOUNTING_GEN
-   select CONTEXT_TRACKING_FORCE
select IRQ_WORK
help
 Adaptively try to shutdown the tick whenever possible, even when
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/21] context_tracking: Ground setup for static key use

Prepare for using a static key in the context tracking subsystem.
This will help optimizing the off case on its many users:

* user_enter, user_exit, exception_enter, exception_exit, guest_enter,
  guest_exit, vtime_*()

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |2 ++
 kernel/context_tracking.c|   26 --
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 1ae37c7..f9356eb 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct context_tracking {
@@ -22,6 +23,7 @@ struct context_tracking {
 
 
 #ifdef CONFIG_CONTEXT_TRACKING
+extern struct static_key context_tracking_enabled;
 DECLARE_PER_CPU(struct context_tracking, context_tracking);
 
 static inline bool context_tracking_in_user(void)
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 72bcb25..f07505c 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -20,15 +20,16 @@
 #include 
 #include 
 
-DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
-#ifdef CONFIG_CONTEXT_TRACKING_FORCE
-   .active = true,
-#endif
-};
+struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;
+
+DEFINE_PER_CPU(struct context_tracking, context_tracking);
 
 void context_tracking_cpu_set(int cpu)
 {
-   per_cpu(context_tracking.active, cpu) = true;
+   if (!per_cpu(context_tracking.active, cpu)) {
+   per_cpu(context_tracking.active, cpu) = true;
+   static_key_slow_inc(_tracking_enabled);
+   }
 }
 
 /**
@@ -202,3 +203,16 @@ void context_tracking_task_switch(struct task_struct *prev,
clear_tsk_thread_flag(prev, TIF_NOHZ);
set_tsk_thread_flag(next, TIF_NOHZ);
 }
+
+#ifdef CONFIG_CONTEXT_TRACKING_FORCE
+static int __init context_tracking_init(void)
+{
+   int cpu;
+
+   for_each_possible_cpu(cpu)
+   context_tracking_cpu_set(cpu);
+
+   return 0;
+}
+early_initcall(context_tracking_init);
+#endif
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/21] vtime: Remove a few unneeded generic vtime state checks

Some generic vtime APIs check if the vtime accounting
is enabled on the local CPU before doing their work.

Some of these are not needed because all their callers already
take care of that. Let's remove the checks on these.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 kernel/sched/cputime.c |   13 +
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index bb6b29a..5f273b47 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -664,9 +664,6 @@ static void __vtime_account_system(struct task_struct *tsk)
 
 void vtime_account_system(struct task_struct *tsk)
 {
-   if (!vtime_accounting_enabled())
-   return;
-
write_seqlock(>vtime_seqlock);
__vtime_account_system(tsk);
write_sequnlock(>vtime_seqlock);
@@ -686,12 +683,7 @@ void vtime_account_irq_exit(struct task_struct *tsk)
 
 void vtime_account_user(struct task_struct *tsk)
 {
-   cputime_t delta_cpu;
-
-   if (!vtime_accounting_enabled())
-   return;
-
-   delta_cpu = get_vtime_delta(tsk);
+   cputime_t delta_cpu = get_vtime_delta(tsk);
 
write_seqlock(>vtime_seqlock);
tsk->vtime_snap_whence = VTIME_SYS;
@@ -701,9 +693,6 @@ void vtime_account_user(struct task_struct *tsk)
 
 void vtime_user_enter(struct task_struct *tsk)
 {
-   if (!vtime_accounting_enabled())
-   return;
-
write_seqlock(>vtime_seqlock);
tsk->vtime_snap_whence = VTIME_USER;
__vtime_account_system(tsk);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/21] context_tracking: Split low level state headers

We plan to use the context tracking static key on inline
vtime APIs. For this we need to include the context tracking
headers from those of vtime.

However vtime headers need to stay low level because they are
included in hardirq.h that mostly contains standalone
definitions. But context_tracking.h includes sched.h for
a few task_struct references, therefore it wouldn't be sensible
to include it from vtime.h

To solve this, lets split the context tracking headers and move
out the pure state definitions that only require a few low level
headers. We can safely include that small part in vtime.h later.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h   |   31 +
 include/linux/context_tracking_state.h |   39 
 2 files changed, 40 insertions(+), 30 deletions(-)
 create mode 100644 include/linux/context_tracking_state.h

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 66a8397..8b6eedb 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -2,40 +2,12 @@
 #define _LINUX_CONTEXT_TRACKING_H
 
 #include 
-#include 
 #include 
-#include 
+#include 
 #include 
 
-struct context_tracking {
-   /*
-* When active is false, probes are unset in order
-* to minimize overhead: TIF flags are cleared
-* and calls to user_enter/exit are ignored. This
-* may be further optimized using static keys.
-*/
-   bool active;
-   enum ctx_state {
-   IN_KERNEL = 0,
-   IN_USER,
-   } state;
-};
-
 
 #ifdef CONFIG_CONTEXT_TRACKING
-extern struct static_key context_tracking_enabled;
-DECLARE_PER_CPU(struct context_tracking, context_tracking);
-
-static inline bool context_tracking_in_user(void)
-{
-   return __this_cpu_read(context_tracking.state) == IN_USER;
-}
-
-static inline bool context_tracking_active(void)
-{
-   return __this_cpu_read(context_tracking.active);
-}
-
 extern void context_tracking_cpu_set(int cpu);
 
 extern void context_tracking_user_enter(void);
@@ -83,7 +55,6 @@ static inline void context_tracking_task_switch(struct 
task_struct *prev,
__context_tracking_task_switch(prev, next);
 }
 #else
-static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
 static inline void user_exit(void) { }
 static inline enum ctx_state exception_enter(void) { return 0; }
diff --git a/include/linux/context_tracking_state.h 
b/include/linux/context_tracking_state.h
new file mode 100644
index 000..0f1979d
--- /dev/null
+++ b/include/linux/context_tracking_state.h
@@ -0,0 +1,39 @@
+#ifndef _LINUX_CONTEXT_TRACKING_STATE_H
+#define _LINUX_CONTEXT_TRACKING_STATE_H
+
+#include 
+#include 
+
+struct context_tracking {
+   /*
+* When active is false, probes are unset in order
+* to minimize overhead: TIF flags are cleared
+* and calls to user_enter/exit are ignored. This
+* may be further optimized using static keys.
+*/
+   bool active;
+   enum ctx_state {
+   IN_KERNEL = 0,
+   IN_USER,
+   } state;
+};
+
+#ifdef CONFIG_CONTEXT_TRACKING
+extern struct static_key context_tracking_enabled;
+DECLARE_PER_CPU(struct context_tracking, context_tracking);
+
+static inline bool context_tracking_in_user(void)
+{
+   return __this_cpu_read(context_tracking.state) == IN_USER;
+}
+
+static inline bool context_tracking_active(void)
+{
+   return __this_cpu_read(context_tracking.active);
+}
+#else
+static inline bool context_tracking_in_user(void) { return false; }
+static inline bool context_tracking_active(void) { return false; }
+#endif /* CONFIG_CONTEXT_TRACKING */
+
+#endif
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/21] vtime: Optimize full dynticks accounting off case with static keys

If no CPU is in the full dynticks range, we can avoid the full
dynticks cputime accounting through generic vtime along with its
overhead and use the traditional tick based accounting instead.

Let's do this and nope the off case with static keys.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |6 +--
 include/linux/vtime.h|   70 +-
 kernel/sched/cputime.c   |   22 ++--
 3 files changed, 67 insertions(+), 31 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 8b6eedb..655356a 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -66,8 +66,7 @@ static inline void context_tracking_task_switch(struct 
task_struct *prev,
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 static inline void guest_enter(void)
 {
-   if (static_key_false(_tracking_enabled) &&
-   vtime_accounting_enabled())
+   if (vtime_accounting_enabled())
vtime_guest_enter(current);
else
current->flags |= PF_VCPU;
@@ -75,8 +74,7 @@ static inline void guest_enter(void)
 
 static inline void guest_exit(void)
 {
-   if (static_key_false(_tracking_enabled) &&
-   vtime_accounting_enabled())
+   if (vtime_accounting_enabled())
vtime_guest_exit(current);
else
current->flags &= ~PF_VCPU;
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 2ad0739..f5b72b3 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -1,22 +1,68 @@
 #ifndef _LINUX_KERNEL_VTIME_H
 #define _LINUX_KERNEL_VTIME_H
 
+#include 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 #include 
 #endif
 
+
 struct task_struct;
 
+/*
+ * vtime_accounting_enabled() definitions/declarations
+ */
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+static inline bool vtime_accounting_enabled(void) { return true; }
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+static inline bool vtime_accounting_enabled(void)
+{
+   if (static_key_false(_tracking_enabled)) {
+   if (context_tracking_active())
+   return true;
+   }
+
+   return false;
+}
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
+
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+static inline bool vtime_accounting_enabled(void) { return false; }
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
+
+
+/*
+ * Common vtime APIs
+ */
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
+
+#ifdef __ARCH_HAS_VTIME_TASK_SWITCH
 extern void vtime_task_switch(struct task_struct *prev);
+#else
+extern void vtime_common_task_switch(struct task_struct *prev);
+static inline void vtime_task_switch(struct task_struct *prev)
+{
+   if (vtime_accounting_enabled())
+   vtime_common_task_switch(prev);
+}
+#endif /* __ARCH_HAS_VTIME_TASK_SWITCH */
+
 extern void vtime_account_system(struct task_struct *tsk);
 extern void vtime_account_idle(struct task_struct *tsk);
 extern void vtime_account_user(struct task_struct *tsk);
-extern void vtime_account_irq_enter(struct task_struct *tsk);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-static inline bool vtime_accounting_enabled(void) { return true; }
-#endif
+#ifdef __ARCH_HAS_VTIME_ACCOUNT
+extern void vtime_account_irq_enter(struct task_struct *tsk);
+#else
+extern void vtime_common_account_irq_enter(struct task_struct *tsk);
+static inline void vtime_account_irq_enter(struct task_struct *tsk)
+{
+   if (vtime_accounting_enabled())
+   vtime_common_account_irq_enter(tsk);
+}
+#endif /* __ARCH_HAS_VTIME_ACCOUNT */
 
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
@@ -24,14 +70,20 @@ static inline void vtime_task_switch(struct task_struct 
*prev) { }
 static inline void vtime_account_system(struct task_struct *tsk) { }
 static inline void vtime_account_user(struct task_struct *tsk) { }
 static inline void vtime_account_irq_enter(struct task_struct *tsk) { }
-static inline bool vtime_accounting_enabled(void) { return false; }
-#endif
+#endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
 extern void arch_vtime_task_switch(struct task_struct *tsk);
-extern void vtime_account_irq_exit(struct task_struct *tsk);
-extern bool vtime_accounting_enabled(void);
+extern void vtime_gen_account_irq_exit(struct task_struct *tsk);
+
+static inline void vtime_account_irq_exit(struct task_struct *tsk)
+{
+   if (vtime_accounting_enabled())
+   vtime_gen_account_irq_exit(tsk);
+}
+
 extern void vtime_user_enter(struct task_struct *tsk);
+
 static inline void vtime_user_exit(struct task_struct *tsk)
 {
vtime_account_user(tsk);
@@ -39,7 +91,7 @@ static inline void vtime_user_exit(struct task_struct *tsk)
 extern void vtime_guest_enter(struct task_struct

[PATCH 17/21] vtime: Always scale generic vtime accounting results

The cputime accounting in full dynticks can be a subtle
mixup of CPUs using tick based accounting and others using
generic vtime.

As long as the tick can have a share on producing these stats, we
want to scale the result against CFS precise accounting as the tick
can miss some task hiding between the periodic interrupt.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 kernel/sched/cputime.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0831b06..e9e742e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -553,12 +553,6 @@ static void cputime_adjust(struct task_cputime *curr,
 {
cputime_t rtime, stime, utime, total;
 
-   if (vtime_accounting_enabled()) {
-   *ut = curr->utime;
-   *st = curr->stime;
-   return;
-   }
-
stime = curr->stime;
total = stime + curr->utime;
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/21] vtime: Describe overriden functions in dedicated arch headers

If the arch overrides some generic vtime APIs, let it describe
these on a dedicated and standalone header. This way it becomes
convenient to include it in vtime generic headers without irrelevant
stuff in such a low level header.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
---
 arch/ia64/include/asm/Kbuild|1 +
 arch/powerpc/include/asm/Kbuild |1 +
 arch/s390/include/asm/cputime.h |3 ---
 arch/s390/include/asm/vtime.h   |7 +++
 arch/s390/kernel/vtime.c|1 +
 include/linux/vtime.h   |4 
 6 files changed, 14 insertions(+), 3 deletions(-)
 create mode 100644 arch/s390/include/asm/vtime.h
 create mode 100644 include/asm-generic/vtime.h

diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 05b03ec..a3456f3 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -3,3 +3,4 @@ generic-y += clkdev.h
 generic-y += exec.h
 generic-y += kvm_para.h
 generic-y += trace_clock.h
+generic-y += vtime.h
\ No newline at end of file
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 650757c..704e6f1 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -2,3 +2,4 @@
 generic-y += clkdev.h
 generic-y += rwsem.h
 generic-y += trace_clock.h
+generic-y += vtime.h
\ No newline at end of file
diff --git a/arch/s390/include/asm/cputime.h b/arch/s390/include/asm/cputime.h
index d2ff4137..f65bd36 100644
--- a/arch/s390/include/asm/cputime.h
+++ b/arch/s390/include/asm/cputime.h
@@ -13,9 +13,6 @@
 #include 
 
 
-#define __ARCH_HAS_VTIME_ACCOUNT
-#define __ARCH_HAS_VTIME_TASK_SWITCH
-
 /* We want to use full resolution of the CPU timer: 2**-12 micro-seconds. */
 
 typedef unsigned long long __nocast cputime_t;
diff --git a/arch/s390/include/asm/vtime.h b/arch/s390/include/asm/vtime.h
new file mode 100644
index 000..af9896c
--- /dev/null
+++ b/arch/s390/include/asm/vtime.h
@@ -0,0 +1,7 @@
+#ifndef _S390_VTIME_H
+#define _S390_VTIME_H
+
+#define __ARCH_HAS_VTIME_ACCOUNT
+#define __ARCH_HAS_VTIME_TASK_SWITCH
+
+#endif /* _S390_VTIME_H */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 9b9c1b7..abcfab5 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "entry.h"
 
diff --git a/include/asm-generic/vtime.h b/include/asm-generic/vtime.h
new file mode 100644
index 000..e69de29
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index b1dd2db..2ad0739 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -1,6 +1,10 @@
 #ifndef _LINUX_KERNEL_VTIME_H
 #define _LINUX_KERNEL_VTIME_H
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+#include 
+#endif
+
 struct task_struct;
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/21] context_tracking: Optimize context switch off case with static keys

No need for syscall slowpath if no CPU is full dynticks,
rather nop this in this case.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |   11 +--
 kernel/context_tracking.c|6 +++---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 03a32b0..66a8397 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -40,6 +40,8 @@ extern void context_tracking_cpu_set(int cpu);
 
 extern void context_tracking_user_enter(void);
 extern void context_tracking_user_exit(void);
+extern void __context_tracking_task_switch(struct task_struct *prev,
+  struct task_struct *next);
 
 static inline void user_enter(void)
 {
@@ -74,8 +76,12 @@ static inline void exception_exit(enum ctx_state prev_ctx)
}
 }
 
-extern void context_tracking_task_switch(struct task_struct *prev,
-struct task_struct *next);
+static inline void context_tracking_task_switch(struct task_struct *prev,
+   struct task_struct *next)
+{
+   if (static_key_false(_tracking_enabled))
+   __context_tracking_task_switch(prev, next);
+}
 #else
 static inline bool context_tracking_in_user(void) { return false; }
 static inline void user_enter(void) { }
@@ -104,6 +110,7 @@ static inline void guest_exit(void)
else
current->flags &= ~PF_VCPU;
 }
+
 #else
 static inline void guest_enter(void)
 {
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 5afa36b..ef21e4f 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -166,7 +166,7 @@ void context_tracking_user_exit(void)
 }
 
 /**
- * context_tracking_task_switch - context switch the syscall callbacks
+ * __context_tracking_task_switch - context switch the syscall callbacks
  * @prev: the task that is being switched out
  * @next: the task that is being switched in
  *
@@ -178,8 +178,8 @@ void context_tracking_user_exit(void)
  * migrate to some CPU that doesn't do the context tracking. As such the TIF
  * flag may not be desired there.
  */
-void context_tracking_task_switch(struct task_struct *prev,
-struct task_struct *next)
+void __context_tracking_task_switch(struct task_struct *prev,
+   struct task_struct *next)
 {
clear_tsk_thread_flag(prev, TIF_NOHZ);
set_tsk_thread_flag(next, TIF_NOHZ);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/21] nohz: Optimize full dynticks state checks with static keys

These APIs are frequenctly accessed and priority is given
to optimize the full dynticks off-case in order to let
distros enable this feature without suffering from
significant performance regressions.

Let's inline these APIs and optimize them with static keys.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/tick.h |   25 +++--
 kernel/time/tick-sched.c |   14 ++
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9180f4b..c60b079 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -10,6 +10,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 
@@ -158,15 +160,34 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 
*unused) { return -1; }
 # endif /* !CONFIG_NO_HZ_COMMON */
 
 #ifdef CONFIG_NO_HZ_FULL
+extern bool tick_nohz_full_running;
+extern cpumask_var_t tick_nohz_full_mask;
+
+static inline bool tick_nohz_full_enabled(void)
+{
+   if (!static_key_false(_tracking_enabled))
+   return false;
+
+   return tick_nohz_full_running;
+}
+
+static inline bool tick_nohz_full_cpu(int cpu)
+{
+   if (!tick_nohz_full_enabled())
+   return false;
+
+   return cpumask_test_cpu(cpu, tick_nohz_full_mask);
+}
+
 extern void tick_nohz_init(void);
-extern int tick_nohz_full_cpu(int cpu);
 extern void tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_all(void);
 extern void tick_nohz_task_switch(struct task_struct *tsk);
 #else
 static inline void tick_nohz_init(void) { }
-static inline int tick_nohz_full_cpu(int cpu) { return 0; }
+static inline bool tick_nohz_full_enabled(void) { return false; }
+static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 71735ea..6d6bd6e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -149,7 +149,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct 
pt_regs *regs)
 }
 
 #ifdef CONFIG_NO_HZ_FULL
-static cpumask_var_t tick_nohz_full_mask;
+cpumask_var_t tick_nohz_full_mask;
 bool tick_nohz_full_running;
 
 static bool can_stop_full_tick(void)
@@ -269,14 +269,6 @@ out:
local_irq_restore(flags);
 }
 
-int tick_nohz_full_cpu(int cpu)
-{
-   if (!tick_nohz_full_running)
-   return 0;
-
-   return cpumask_test_cpu(cpu, tick_nohz_full_mask);
-}
-
 /* Parse the boot-time nohz CPU list from the kernel parameters. */
 static int __init tick_nohz_full_setup(char *str)
 {
@@ -358,8 +350,6 @@ void __init tick_nohz_init(void)
cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), 
tick_nohz_full_mask);
pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
 }
-#else
-#define tick_nohz_full_running (0)
 #endif
 
 /*
@@ -737,7 +727,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
return false;
}
 
-   if (tick_nohz_full_running) {
+   if (tick_nohz_full_enabled()) {
/*
 * Keep the tick alive to guarantee timekeeping progression
 * if there are full dynticks CPUs around
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 18/21] vtime: Always debug check snapshot source _before_ updating it

The vtime delta update performed by get_vtime_delta() always check
that the source of the snapshot is valid.

Meanhile the snapshot updaters that rely on get_vtime_delta() also
set the new snapshot origin. But some of them do this right before
the call to get_vtime_delta(), making its debug check useless.

This is easily fixable by moving the snapshot origin update after
the call to get_vtime_delta(). The order doesn't matter there.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 kernel/sched/cputime.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e9e742e..c1d7493 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -660,9 +660,9 @@ void vtime_account_system(struct task_struct *tsk)
 void vtime_gen_account_irq_exit(struct task_struct *tsk)
 {
write_seqlock(>vtime_seqlock);
+   __vtime_account_system(tsk);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
-   __vtime_account_system(tsk);
write_sequnlock(>vtime_seqlock);
 }
 
@@ -680,8 +680,8 @@ void vtime_account_user(struct task_struct *tsk)
 void vtime_user_enter(struct task_struct *tsk)
 {
write_seqlock(>vtime_seqlock);
-   tsk->vtime_snap_whence = VTIME_USER;
__vtime_account_system(tsk);
+   tsk->vtime_snap_whence = VTIME_USER;
write_sequnlock(>vtime_seqlock);
 }
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/21] vtime: Fix racy cputime delta update

get_vtime_delta() must be called under the task vtime_seqlock
with the code that does the cputime accounting flush.

Otherwise the cputime reader can be fooled and run into
a race where it sees the snapshot update but misses the
cputime flush. As a result it can report a cputime that is
way too short.

Fix vtime_account_user() that wasn't complying to that rule.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 kernel/sched/cputime.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5f273b47..b62d5c0 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -683,9 +683,10 @@ void vtime_account_irq_exit(struct task_struct *tsk)
 
 void vtime_account_user(struct task_struct *tsk)
 {
-   cputime_t delta_cpu = get_vtime_delta(tsk);
+   cputime_t delta_cpu;
 
write_seqlock(>vtime_seqlock);
+   delta_cpu = get_vtime_delta(tsk);
tsk->vtime_snap_whence = VTIME_SYS;
account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
write_sequnlock(>vtime_seqlock);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 21/21] nohz: Optimize full dynticks's sched hooks with static keys

Scheduler IPIs and task context switches are serious fast path.
Let's try to hide as much as we can the impact of full
dynticks APIs' off case that are called on these sites
through the use of static keys.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/tick.h |   20 
 kernel/time/tick-sched.c |8 
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index c60b079..a7ef1d6 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -180,20 +180,32 @@ static inline bool tick_nohz_full_cpu(int cpu)
 }
 
 extern void tick_nohz_init(void);
-extern void tick_nohz_full_check(void);
+extern void __tick_nohz_full_check(void);
 extern void tick_nohz_full_kick(void);
 extern void tick_nohz_full_kick_all(void);
-extern void tick_nohz_task_switch(struct task_struct *tsk);
+extern void __tick_nohz_task_switch(struct task_struct *tsk);
 #else
 static inline void tick_nohz_init(void) { }
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
-static inline void tick_nohz_full_check(void) { }
+static inline void __tick_nohz_full_check(void) { }
 static inline void tick_nohz_full_kick(void) { }
 static inline void tick_nohz_full_kick_all(void) { }
-static inline void tick_nohz_task_switch(struct task_struct *tsk) { }
+static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
 #endif
 
+static inline void tick_nohz_full_check(void)
+{
+   if (tick_nohz_full_enabled())
+   __tick_nohz_full_check();
+}
+
+static inline void tick_nohz_task_switch(struct task_struct *tsk)
+{
+   if (tick_nohz_full_enabled())
+   __tick_nohz_task_switch(tsk);
+}
+
 
 # ifdef CONFIG_CPU_IDLE_GOV_MENU
 extern void menu_hrtimer_cancel(void);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6d6bd6e..73997be 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -197,7 +197,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched 
*ts, ktime_t now);
  * Re-evaluate the need for the tick on the current CPU
  * and restart it if necessary.
  */
-void tick_nohz_full_check(void)
+void __tick_nohz_full_check(void)
 {
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
 
@@ -211,7 +211,7 @@ void tick_nohz_full_check(void)
 
 static void nohz_full_kick_work_func(struct irq_work *work)
 {
-   tick_nohz_full_check();
+   __tick_nohz_full_check();
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
@@ -230,7 +230,7 @@ void tick_nohz_full_kick(void)
 
 static void nohz_full_kick_ipi(void *info)
 {
-   tick_nohz_full_check();
+   __tick_nohz_full_check();
 }
 
 /*
@@ -253,7 +253,7 @@ void tick_nohz_full_kick_all(void)
  * It might need the tick due to per task/process properties:
  * perf events, posix cpu timers, ...
  */
-void tick_nohz_task_switch(struct task_struct *tsk)
+void __tick_nohz_task_switch(struct task_struct *tsk)
 {
unsigned long flags;
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/21] context_tracking: User/kernel broundary cross trace events

This can be useful to track all kernel/user round trips.
And it's also helpful to debug the context tracking subsystem.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/trace/events/context_tracking.h |   58 +++
 kernel/context_tracking.c   |5 +++
 2 files changed, 63 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/context_tracking.h

diff --git a/include/trace/events/context_tracking.h 
b/include/trace/events/context_tracking.h
new file mode 100644
index 000..ce8007c
--- /dev/null
+++ b/include/trace/events/context_tracking.h
@@ -0,0 +1,58 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM context_tracking
+
+#if !defined(_TRACE_CONTEXT_TRACKING_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_CONTEXT_TRACKING_H
+
+#include 
+
+DECLARE_EVENT_CLASS(context_tracking_user,
+
+   TP_PROTO(int dummy),
+
+   TP_ARGS(dummy),
+
+   TP_STRUCT__entry(
+   __field( int,   dummy   )
+   ),
+
+   TP_fast_assign(
+   __entry->dummy  = dummy;
+   ),
+
+   TP_printk("%s", "")
+);
+
+/**
+ * user_enter - called when the kernel resumes to userspace
+ * @dummy: dummy arg to make trace event macro happy
+ *
+ * This event occurs when the kernel resumes to userspace  after
+ * an exception or a syscall.
+ */
+DEFINE_EVENT(context_tracking_user, user_enter,
+
+   TP_PROTO(int dummy),
+
+   TP_ARGS(dummy)
+);
+
+/**
+ * user_exit - called when userspace enters the kernel
+ * @dummy: dummy arg to make trace event macro happy
+ *
+ * This event occurs when userspace enters the kernel through
+ * an exception or a syscall.
+ */
+DEFINE_EVENT(context_tracking_user, user_exit,
+
+   TP_PROTO(int dummy),
+
+   TP_ARGS(dummy)
+);
+
+
+#endif /*  _TRACE_CONTEXT_TRACKING_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index ef21e4f..688efe4 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -20,6 +20,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;
 EXPORT_SYMBOL_GPL(context_tracking_enabled);
 
@@ -64,6 +67,7 @@ void context_tracking_user_enter(void)
local_irq_save(flags);
if ( __this_cpu_read(context_tracking.state) != IN_USER) {
if (__this_cpu_read(context_tracking.active)) {
+   trace_user_enter(0);
/*
 * At this stage, only low level arch entry code 
remains and
 * then we'll run in userspace. We can assume there 
won't be
@@ -159,6 +163,7 @@ void context_tracking_user_exit(void)
 */
rcu_user_exit();
vtime_user_exit(current);
+   trace_user_exit(0);
}
__this_cpu_write(context_tracking.state, IN_KERNEL);
}
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/21] nohz patches for 3.12 preview v2

Hi,

This is a respin of the series that optimize full dynticks off-case with
static keys. It seems that some distros are interested in full dynticks so we
need to optimize the off case such that unconcerned users are not impacted
by performance regressions. 

Thanks to Steve for his reviews on the previous version! I hope the
changelogs and comments are better in this version.

---

Changes since last posting:

* Rebase against 3.11-rc2

* Dropped because merged in -tip through urgent queue: 
nohz: Do not warn about unstable tsc unless user uses nohz_full
nohz: fix compile warning in tick_nohz_init()

Reported by Steve:

* Fix confusing comments in [03/21]
* Fix confusing changelog [05/21]
* Split [05/21] with new patch to enhance CONFIG_CONTEXT_TRACKING_FORCE
  Kconfig help text, see [06/21]

Bugfixes:

* Fix missing exported symbol, clarify changes by seperating guest APIs
optimization in a seperate patch [09/21]

Further:

* Use static keys on full dynticks APIs [19-21/21]


You can snort from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz-3.12-preview-v2

Thanks,
Frederic
---

Frederic Weisbecker (21):
  sched: Consolidate open coded preemptible() checks
  context_tracing: Fix guest accounting with native vtime
  vtime: Update a few comments
  context_tracking: Fix runtime CPU off-case
  nohz: Only enable context tracking on full dynticks CPUs
  context_tracking: Remove full dynticks' hacky dependency on wide context 
tracking
  context_tracking: Ground setup for static key use
  context_tracking: Optimize main APIs off case with static key
  context_tracking: Optimize guest APIs off case with static key
  context_tracking: Optimize context switch off case with static keys
  context_tracking: User/kernel broundary cross trace events
  vtime: Remove a few unneeded generic vtime state checks
  vtime: Fix racy cputime delta update
  context_tracking: Split low level state headers
  vtime: Describe overriden functions in dedicated arch headers
  vtime: Optimize full dynticks accounting off case with static keys
  vtime: Always scale generic vtime accounting results
  vtime: Always debug check snapshot source _before_ updating it
  nohz: Rename a few state variables
  nohz: Optimize full dynticks state checks with static keys
  nohz: Optimize full dynticks's sched hooks with static keys


 arch/ia64/include/asm/Kbuild|1 +
 arch/powerpc/include/asm/Kbuild |1 +
 arch/s390/include/asm/cputime.h |3 -
 arch/s390/include/asm/vtime.h   |7 ++
 arch/s390/kernel/vtime.c|1 +
 include/linux/context_tracking.h|  120 +++--
 include/linux/context_tracking_state.h  |   39 +
 include/linux/tick.h|   45 +--
 include/linux/vtime.h   |   74 --
 include/trace/events/context_tracking.h |   58 ++
 init/Kconfig|   28 +--
 kernel/context_tracking.c   |  128 ++-
 kernel/sched/core.c |4 +-
 kernel/sched/cputime.c  |   53 -
 kernel/time/Kconfig |1 -
 kernel/time/tick-sched.c|   56 ++
 16 files changed, 410 insertions(+), 209 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/21] vtime: Update a few comments

Update a stale comment from the old vtime era and document some
locking that might be non obvious.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |   10 --
 kernel/sched/cputime.c   |7 +++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 5984f25..d883ff0 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -72,8 +72,9 @@ extern void guest_exit(void);
 static inline void guest_enter(void)
 {
/*
-* This is running in ioctl context so we can avoid
-* the call to vtime_account() with its unnecessary idle check.
+* This is running in ioctl context so its safe
+* to assume that it's the stime pending cputime
+* to flush.
 */
vtime_account_system(current);
current->flags |= PF_VCPU;
@@ -81,10 +82,7 @@ static inline void guest_enter(void)
 
 static inline void guest_exit(void)
 {
-   /*
-* This is running in ioctl context so we can avoid
-* the call to vtime_account() with its unnecessary idle check.
-*/
+   /* Flush the guest cputime we spent on the guest */
vtime_account_system(current);
current->flags &= ~PF_VCPU;
 }
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a7959e0..223a35e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -712,6 +712,13 @@ void vtime_user_enter(struct task_struct *tsk)
 
 void vtime_guest_enter(struct task_struct *tsk)
 {
+   /*
+* The flags must be updated under the lock with
+* the vtime_snap flush and update.
+* That enforces a right ordering and update sequence
+* synchronization against the reader (task_gtime())
+* that can thus safely catch up with a tickless delta.
+*/
write_seqlock(>vtime_seqlock);
__vtime_account_system(tsk);
current->flags |= PF_VCPU;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 19/21] nohz: Rename a few state variables

Rename the full dynticks's cpumask and cpumask state variables
to some more exportable names.

These will be used later from global headers to optimize
the main full dynticks APIs in conjunction with static keys.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 kernel/time/tick-sched.c |   42 +-
 1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6d604fd..71735ea 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -149,8 +149,8 @@ static void tick_sched_handle(struct tick_sched *ts, struct 
pt_regs *regs)
 }
 
 #ifdef CONFIG_NO_HZ_FULL
-static cpumask_var_t nohz_full_mask;
-bool have_nohz_full_mask;
+static cpumask_var_t tick_nohz_full_mask;
+bool tick_nohz_full_running;
 
 static bool can_stop_full_tick(void)
 {
@@ -239,11 +239,11 @@ static void nohz_full_kick_ipi(void *info)
  */
 void tick_nohz_full_kick_all(void)
 {
-   if (!have_nohz_full_mask)
+   if (!tick_nohz_full_running)
return;
 
preempt_disable();
-   smp_call_function_many(nohz_full_mask,
+   smp_call_function_many(tick_nohz_full_mask,
   nohz_full_kick_ipi, NULL, false);
preempt_enable();
 }
@@ -271,10 +271,10 @@ out:
 
 int tick_nohz_full_cpu(int cpu)
 {
-   if (!have_nohz_full_mask)
+   if (!tick_nohz_full_running)
return 0;
 
-   return cpumask_test_cpu(cpu, nohz_full_mask);
+   return cpumask_test_cpu(cpu, tick_nohz_full_mask);
 }
 
 /* Parse the boot-time nohz CPU list from the kernel parameters. */
@@ -282,18 +282,18 @@ static int __init tick_nohz_full_setup(char *str)
 {
int cpu;
 
-   alloc_bootmem_cpumask_var(_full_mask);
-   if (cpulist_parse(str, nohz_full_mask) < 0) {
+   alloc_bootmem_cpumask_var(_nohz_full_mask);
+   if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
return 1;
}
 
cpu = smp_processor_id();
-   if (cpumask_test_cpu(cpu, nohz_full_mask)) {
+   if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
pr_warning("NO_HZ: Clearing %d from nohz_full range for 
timekeeping\n", cpu);
-   cpumask_clear_cpu(cpu, nohz_full_mask);
+   cpumask_clear_cpu(cpu, tick_nohz_full_mask);
}
-   have_nohz_full_mask = true;
+   tick_nohz_full_running = true;
 
return 1;
 }
@@ -311,7 +311,7 @@ static int tick_nohz_cpu_down_callback(struct 
notifier_block *nfb,
 * If we handle the timekeeping duty for full dynticks CPUs,
 * we can't safely shutdown that CPU.
 */
-   if (have_nohz_full_mask && tick_do_timer_cpu == cpu)
+   if (tick_nohz_full_running && tick_do_timer_cpu == cpu)
return NOTIFY_BAD;
break;
}
@@ -330,14 +330,14 @@ static int tick_nohz_init_all(void)
int err = -1;
 
 #ifdef CONFIG_NO_HZ_FULL_ALL
-   if (!alloc_cpumask_var(_full_mask, GFP_KERNEL)) {
+   if (!alloc_cpumask_var(_nohz_full_mask, GFP_KERNEL)) {
pr_err("NO_HZ: Can't allocate full dynticks cpumask\n");
return err;
}
err = 0;
-   cpumask_setall(nohz_full_mask);
-   cpumask_clear_cpu(smp_processor_id(), nohz_full_mask);
-   have_nohz_full_mask = true;
+   cpumask_setall(tick_nohz_full_mask);
+   cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
+   tick_nohz_full_running = true;
 #endif
return err;
 }
@@ -346,20 +346,20 @@ void __init tick_nohz_init(void)
 {
int cpu;
 
-   if (!have_nohz_full_mask) {
+   if (!tick_nohz_full_running) {
if (tick_nohz_init_all() < 0)
return;
}
 
-   for_each_cpu(cpu, nohz_full_mask)
+   for_each_cpu(cpu, tick_nohz_full_mask)
context_tracking_cpu_set(cpu);
 
cpu_notifier(tick_nohz_cpu_down_callback, 0);
-   cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), nohz_full_mask);
+   cpulist_scnprintf(nohz_full_buf, sizeof(nohz_full_buf), 
tick_nohz_full_mask);
pr_info("NO_HZ: Full dynticks CPUs: %s.\n", nohz_full_buf);
 }
 #else
-#define have_nohz_full_mask (0)
+#define tick_nohz_full_running (0)
 #endif
 
 /*
@@ -737,7 +737,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched 
*ts)
return false;
}
 
-   if (have_nohz_full_mask) {
+   if (tick_nohz_full_running) {
/*
 * Keep the tick alive to guarantee timekeeping progression
 * if there are full dynticks CPUs around
-- 
1.7.5.4

--
To unsubscribe from this list: send the line

[PATCH 09/21] context_tracking: Optimize guest APIs off case with static key

Optimize guest entry/exit APIs with static keys. This minimize
the overhead for those who enable CONFIG_NO_HZ_FULL without
always using it. Having no range passed to nohz_full= should
result in the probes overhead to be minimized.

Signed-off-by: Frederic Weisbecker 
Cc: Steven Rostedt 
Cc: Paul E. McKenney 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Borislav Petkov 
Cc: Li Zhong 
Cc: Mike Galbraith 
Cc: Kevin Hilman 
---
 include/linux/context_tracking.h |   19 +--
 kernel/context_tracking.c|   23 ++-
 kernel/sched/cputime.c   |2 ++
 3 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index e5ec0c9..03a32b0 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -87,8 +87,23 @@ static inline void context_tracking_task_switch(struct 
task_struct *prev,
 #endif /* !CONFIG_CONTEXT_TRACKING */
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-extern void guest_enter(void);
-extern void guest_exit(void);
+static inline void guest_enter(void)
+{
+   if (static_key_false(_tracking_enabled) &&
+   vtime_accounting_enabled())
+   vtime_guest_enter(current);
+   else
+   current->flags |= PF_VCPU;
+}
+
+static inline void guest_exit(void)
+{
+   if (static_key_false(_tracking_enabled) &&
+   vtime_accounting_enabled())
+   vtime_guest_exit(current);
+   else
+   current->flags &= ~PF_VCPU;
+}
 #else
 static inline void guest_enter(void)
 {
diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 657f668..5afa36b 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -21,8 +21,10 @@
 #include 
 
 struct static_key context_tracking_enabled = STATIC_KEY_INIT_FALSE;
+EXPORT_SYMBOL_GPL(context_tracking_enabled);
 
 DEFINE_PER_CPU(struct context_tracking, context_tracking);
+EXPORT_SYMBOL_GPL(context_tracking);
 
 void context_tracking_cpu_set(int cpu)
 {
@@ -163,27 +165,6 @@ void context_tracking_user_exit(void)
local_irq_restore(flags);
 }
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-void guest_enter(void)
-{
-   if (vtime_accounting_enabled())
-   vtime_guest_enter(current);
-   else
-   current->flags |= PF_VCPU;
-}
-EXPORT_SYMBOL_GPL(guest_enter);
-
-void guest_exit(void)
-{
-   if (vtime_accounting_enabled())
-   vtime_guest_exit(current);
-   else
-   current->flags &= ~PF_VCPU;
-}
-EXPORT_SYMBOL_GPL(guest_exit);
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_GEN */
-
-
 /**
  * context_tracking_task_switch - context switch the syscall callbacks
  * @prev: the task that is being switched out
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 223a35e..bb6b29a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -724,6 +724,7 @@ void vtime_guest_enter(struct task_struct *tsk)
current->flags |= PF_VCPU;
write_sequnlock(>vtime_seqlock);
 }
+EXPORT_SYMBOL_GPL(vtime_guest_enter);
 
 void vtime_guest_exit(struct task_struct *tsk)
 {
@@ -732,6 +733,7 @@ void vtime_guest_exit(struct task_struct *tsk)
current->flags &= ~PF_VCPU;
write_sequnlock(>vtime_seqlock);
 }
+EXPORT_SYMBOL_GPL(vtime_guest_exit);
 
 void vtime_account_idle(struct task_struct *tsk)
 {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation

Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
amazingly long NMI handlers from a trinity-induced workload involving
lots of concurrent sync() calls (https://lkml.org/lkml/2013/7/23/369).
There are any number of things that one might do to make sync() behave
better under high levels of contention, but it is also the case that
multiple concurrent sync() system calls can be satisfied by a single
sys_sync() invocation.

Given that this situation is reminiscent of rcu_barrier(), this commit
applies the rcu_barrier() approach to sys_sync().  This approach uses
a global mutex and a sequence counter.  The mutex is held across the
sync() operation, which eliminates contention between concurrent sync()
operations.  The counter is incremented at the beginning and end of
each sync() operation, so that it is odd while a sync() operation is in
progress and even otherwise, just like sequence locks.

The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
now handles the concurrency.  The sys_sync() function first takes a
snapshot of the counter, then acquires the mutex, and then takes another
snapshot of the counter.  If the values of the two snapshots indicate that
a full do_sync() executed during the mutex acquisition, the sys_sync()
function releases the mutex and returns ("Our work is done!").  Otherwise,
sys_sync() increments the counter, invokes do_sync(), and increments
the counter again.

This approach allows a single call to do_sync() to satisfy an arbitrarily
large number of sync() system calls, which should eliminate issues due
to large numbers of concurrent invocations of the sync() system call.

Changes since v1 (https://lkml.org/lkml/2013/7/24/683):

o   Add a pair of memory barriers to keep the increments from
bleeding into the do_sync() code.  (The failure probability
is insanely low, but when you have several hundred million
devices running Linux, you can expect several hundred instances
of one-in-a-million failures.)

o   Actually CC some people who have experience in this area.

Reported-by: Dave Jones 
Signed-off-by: Paul E. McKenney 
Cc: Alexander Viro 
Cc: Christoph Hellwig 
Cc: Jan Kara 
Cc: Curt Wohlgemuth 
Cc: Jens Axboe 
Cc: linux-fsde...@vger.kernel.org

 b/fs/sync.c |   45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/fs/sync.c b/fs/sync.c
index 905f3f6..6e851db 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -99,7 +99,7 @@ static void fdatawait_one_bdev(struct block_device *bdev, 
void *arg)
  * just write metadata (such as inodes or bitmaps) to block device page cache
  * and do not sync it on their own in ->sync_fs().
  */
-SYSCALL_DEFINE0(sync)
+static void do_sync(void)
 {
int nowait = 0, wait = 1;
 
@@ -111,6 +111,49 @@ SYSCALL_DEFINE0(sync)
iterate_bdevs(fdatawait_one_bdev, NULL);
if (unlikely(laptop_mode))
laptop_sync_completion();
+   return;
+}
+
+static DEFINE_MUTEX(sync_mutex);   /* One do_sync() at a time. */
+static unsigned long sync_seq; /* Many sync()s from one do_sync(). */
+   /*  Overflow harmless, extra wait. */
+
+/*
+ * Only allow one task to do sync() at a time, and further allow
+ * concurrent sync() calls to be satisfied by a single do_sync()
+ * invocation.
+ */
+SYSCALL_DEFINE0(sync)
+{
+   unsigned long snap;
+   unsigned long snap_done;
+
+   snap = ACCESS_ONCE(sync_seq);
+   smp_mb();  /* Prevent above from bleeding into critical section. */
+   mutex_lock(_mutex);
+   snap_done = ACCESS_ONCE(sync_seq);
+   if (ULONG_CMP_GE(snap_done, ((snap + 1) & ~0x1) + 2)) {
+   /*
+* A full do_sync() executed between our two fetches from
+* sync_seq, so our work is done!
+*/
+   smp_mb(); /* Order test with caller's subsequent code. */
+   mutex_unlock(_mutex);
+   return 0;
+   }
+
+   /* Record the start of do_sync(). */
+   ACCESS_ONCE(sync_seq)++;
+   WARN_ON_ONCE((sync_seq & 0x1) != 1);
+   smp_mb(); /* Keep prior increment out of do_sync(). */
+
+   do_sync();
+
+   /* Record the end of do_sync(). */
+   smp_mb(); /* Keep subsequent increment out of do_sync(). */
+   ACCESS_ONCE(sync_seq)++;
+   WARN_ON_ONCE((sync_seq & 0x1) != 0);
+   mutex_unlock(_mutex);
return 0;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the staging tree

2013-07-26 Thread Greg KH

On Sat, Jul 27, 2013 at 01:54:44AM +0300, Eli Billauer wrote:
> On 27/07/13 00:56, Greg KH wrote:
> >No, I need you to do that.  Can you do a kernel build with:
> > make M=drivers/staging/xillybus C=1
> >and fix up the errors that sparse reports and send a patch for that?
> >
> I'm not sure it's related to me. I get the same errors whether I
> compile my own modules or something in e.g. drivers/tty/ . This is
> what I get after make allmodconfig on the current staging git repo:
> 
> $ make M=drivers/staging/xillybus C=1
> /home/eli/xillybus/submission/staging/arch/x86/Makefile:107:
> CONFIG_X86_X32 enabled but no binutils support
>   CHECK   drivers/staging/xillybus/xillybus_core.c
> /home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13:
> error: Expected ( after asm
> /home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13:
> error: got goto
>   CC [M]  drivers/staging/xillybus/xillybus_core.o
>   CHECK   drivers/staging/xillybus/xillybus_pcie.c
> /home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13:
> error: Expected ( after asm
> /home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13:
> error: got goto
>   CC [M]  drivers/staging/xillybus/xillybus_pcie.o
> 
> I'll spare you the output from modules in drivers/tty. But it's
> exactly the same messages on each of these modules.
> 
> Am I doing something wrong?

Odd, you might need to upgrade the version of sparse you have.  My
output looks like:

$ make M=drivers/staging/xillybus/ C=1
  LD  drivers/staging/xillybus//built-in.o
  CHECK   drivers/staging/xillybus//xillybus_core.c
drivers/staging/xillybus//xillybus_core.c:76:25: warning: symbol 'xillybus_wq' 
was not declared. Should it be static?
drivers/staging/xillybus//xillybus_core.c:175:57: warning: incorrect type in 
argument 2 (different address spaces)
drivers/staging/xillybus//xillybus_core.c:175:57:expected void [noderef] 
*
drivers/staging/xillybus//xillybus_core.c:175:57:got unsigned int 
[usertype] *
drivers/staging/xillybus//xillybus_core.c:309:39: warning: incorrect type in 
argument 2 (different address spaces)
drivers/staging/xillybus//xillybus_core.c:309:39:expected void [noderef] 
*
drivers/staging/xillybus//xillybus_core.c:309:39:got unsigned int 
[usertype] *
drivers/staging/xillybus//xillybus_core.c:606:55: warning: incorrect type in 
argument 2 (different address spaces)
drivers/staging/xillybus//xillybus_core.c:606:55:expected void [noderef] 
*
drivers/staging/xillybus//xillybus_core.c:606:55:got unsigned int 
[usertype] *

and goes on for a few screens.

$ sparse --version
0.4.4

Try a newer version and see if that fixes things.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] pinctrl: rollback check for !dev->pins in pinctrl_pm_select*() APIs

2013-07-26 Thread Linus Walleij

On Wed, Jul 17, 2013 at 5:40 PM, Tony Lindgren  wrote:
> * Grygorii Strashko  [130717 04:49]:
>> The pinctrl support in Device core assumed to be optional - so, It's
>> valid case, when there are no definition for default device's pinctrl
>> states in DT at all ("default", "active", "idle", "sleep").
>> And in this case dev->pins == NULL and pinctrl_pm_select*() API
>> should return 0 always.
>>
>> Now the checks for !dev->pins have been removed from
>> pinctrl_pm_select*() API mistakenly by the patch
>> pinctrl: Remove duplicate code in pinctrl_pm_select_state functions
>> http://www.spinics.net/lists/arm-kernel/msg258829.html
>>
>> Hence, rollback these checks in in pinctrl_pm_select*() APIs.
>
> Thanks, it's best that I fold this fix into my patch as it has not
> been committed yet.

I think I've applied the correct v2 version,
please check that linux-next is in good shape...

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC nohz_full 3/7] nohz_full: Add per-CPU idle-state tracking

From: "Paul E. McKenney" 

This commit adds the code that updates the rcu_dyntick structure's
new fields to track the per-CPU idle state based on interrupts and
transitions into and out of the idle loop (NMIs are ignored because NMI
handlers cannot cleanly read out the time anyway).  This code is similar
to the code that maintains RCU's idea of per-CPU idleness, but differs
in that RCU treats CPUs running in user mode as idle, where this new
code does not.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/rcutree.c|  4 +++
 kernel/rcutree.h|  2 ++
 kernel/rcutree_plugin.h | 79 +
 3 files changed, 85 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 9412726..c1f7cf8 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -416,6 +416,7 @@ void rcu_idle_enter(void)
 
local_irq_save(flags);
rcu_eqs_enter(false);
+   rcu_sysidle_enter(&__get_cpu_var(rcu_dynticks), 0);
local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
@@ -466,6 +467,7 @@ void rcu_irq_exit(void)
trace_rcu_dyntick("--=", oldval, rdtp->dynticks_nesting);
else
rcu_eqs_enter_common(rdtp, oldval, true);
+   rcu_sysidle_enter(rdtp, 1);
local_irq_restore(flags);
 }
 
@@ -534,6 +536,7 @@ void rcu_idle_exit(void)
 
local_irq_save(flags);
rcu_eqs_exit(false);
+   rcu_sysidle_exit(&__get_cpu_var(rcu_dynticks), 0);
local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_exit);
@@ -585,6 +588,7 @@ void rcu_irq_enter(void)
trace_rcu_dyntick("++=", oldval, rdtp->dynticks_nesting);
else
rcu_eqs_exit_common(rdtp, oldval, true);
+   rcu_sysidle_exit(rdtp, 1);
local_irq_restore(flags);
 }
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index bd99d59..1895043 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -553,6 +553,8 @@ static void rcu_boot_init_nocb_percpu_data(struct rcu_data 
*rdp);
 static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp);
 static void rcu_kick_nohz_cpu(int cpu);
 static bool init_nocb_callback_list(struct rcu_data *rdp);
+static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq);
+static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 6937eb6..814ff47 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2380,6 +2380,77 @@ static void rcu_kick_nohz_cpu(int cpu)
 #ifdef CONFIG_NO_HZ_FULL_SYSIDLE
 
 /*
+ * Invoked to note exit from irq or task transition to idle.  Note that
+ * usermode execution does -not- count as idle here!  After all, we want
+ * to detect full-system idle states, not RCU quiescent states and grace
+ * periods.  The caller must have disabled interrupts.
+ */
+static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq)
+{
+   unsigned long j;
+
+   /* Adjust nesting, check for fully idle. */
+   if (irq) {
+   rdtp->dynticks_idle_nesting--;
+   WARN_ON_ONCE(rdtp->dynticks_idle_nesting < 0);
+   if (rdtp->dynticks_idle_nesting != 0)
+   return;  /* Still not fully idle. */
+   } else {
+   if ((rdtp->dynticks_idle_nesting & DYNTICK_TASK_NEST_MASK) ==
+   DYNTICK_TASK_NEST_VALUE) {
+   rdtp->dynticks_idle_nesting = 0;
+   } else {
+   rdtp->dynticks_idle_nesting -= DYNTICK_TASK_NEST_VALUE;
+   WARN_ON_ONCE(rdtp->dynticks_idle_nesting < 0);
+   return;  /* Still not fully idle. */
+   }
+   }
+
+   /* Record start of fully idle period. */
+   j = jiffies;
+   ACCESS_ONCE(rdtp->dynticks_idle_jiffies) = j;
+   smp_mb__before_atomic_inc();
+   atomic_inc(>dynticks_idle);
+   smp_mb__after_atomic_inc();
+   WARN_ON_ONCE(atomic_read(>dynticks_idle) & 0x1);
+}
+
+/*
+ * Invoked to note entry to irq or task transition from idle.  Note that
+ * usermode execution does -not- count as idle here!  The caller must
+ * have disabled interrupts.
+ */
+static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq)
+{
+   /* Adjust nesting, check for already non-idle. */
+   if (irq) {
+   rdtp->dynticks_idle_nesting++;
+   WARN_ON_ONCE(rdtp->dynticks_idle_nesting <= 0);
+   if (rdtp->dynticks_idle_nesting != 1)
+   return; /* Already non-idle. */
+   } else {
+   /*
+* Allow for irq misnesting.  Yes, it really is possible
+* to enter an irq handler then never leave it, and maybe
+* also vice versa.  Handle both possibilities.
+*/
+

[PATCH RFC nohz_full 2/7] nohz_full: Add rcu_dyntick data for scalable detection of all-idle state

From: "Paul E. McKenney" 

This commit adds fields to the rcu_dyntick structure that are used to
detect idle CPUs.  These new fields differ from the existing ones in
that the existing ones consider a CPU executing in user mode to be idle,
where the new ones consider CPUs executing in user mode to be busy.
The handling of these new fields is otherwise quite similar to that for
the exiting fields.  This commit also adds the initialization required
for these fields.

So, why is usermode execution treated differently, with RCU considering
it a quiescent state equivalent to idle, while in contrast the new
full-system idle state detection considers usermode execution to be
non-idle?

It turns out that although one of RCU's quiescent states is usermode
execution, it is not a full-system idle state.  This is because the
purpose of the full-system idle state is not RCU, but rather determining
when accurate timekeeping can safely be disabled.  Whenever accurate
timekeeping is required in a CONFIG_NO_HZ_FULL kernel, at least one
CPU must keep the scheduling-clock tick going.  If even one CPU is
executing in user mode, accurate timekeeping is requires, particularly for
architectures where gettimeofday() and friends do not enter the kernel.
Only when all CPUs are really and truly idle can accurate timekeeping be
disabled, allowing all CPUs to turn off the scheduling clock interrupt,
thus greatly improving energy efficiency.

This naturally raises the question "Why is this code in RCU rather than in
timekeeping?", and the answer is that RCU has the data and infrastructure
to efficiently make this determination.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/rcutree.c|  5 +
 kernel/rcutree.h|  9 +
 kernel/rcutree_plugin.h | 19 +++
 3 files changed, 33 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 928cb45..9412726 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -209,6 +209,10 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
.dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
.dynticks = ATOMIC_INIT(1),
+#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
+   .dynticks_idle_nesting = DYNTICK_TASK_NEST_VALUE,
+   .dynticks_idle = ATOMIC_INIT(1),
+#endif /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
 };
 
 static long blimit = 10;   /* Maximum callbacks per rcu_do_batch. */
@@ -2902,6 +2906,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int 
preemptible)
rdp->blimit = blimit;
init_callback_list(rdp);  /* Re-enable callbacks on this CPU. */
rdp->dynticks->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
+   rcu_sysidle_init_percpu_data(rdp->dynticks);
atomic_set(>dynticks->dynticks,
   (atomic_read(>dynticks->dynticks) & ~0x1) + 1);
raw_spin_unlock(>lock);/* irqs remain disabled. */
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index b383258..bd99d59 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -88,6 +88,14 @@ struct rcu_dynticks {
/* Process level is worth LLONG_MAX/2. */
int dynticks_nmi_nesting;   /* Track NMI nesting level. */
atomic_t dynticks;  /* Even value for idle, else odd. */
+#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
+   long long dynticks_idle_nesting;
+   /* irq/process nesting level from idle. */
+   atomic_t dynticks_idle; /* Even value for idle, else odd. */
+   /*  "Idle" excludes userspace execution. */
+   unsigned long dynticks_idle_jiffies;
+   /* End of last non-NMI non-idle period. */
+#endif /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
 #ifdef CONFIG_RCU_FAST_NO_HZ
bool all_lazy;  /* Are all CPU's CBs lazy? */
unsigned long nonlazy_posted;
@@ -545,6 +553,7 @@ static void rcu_boot_init_nocb_percpu_data(struct rcu_data 
*rdp);
 static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp);
 static void rcu_kick_nohz_cpu(int cpu);
 static bool init_nocb_callback_list(struct rcu_data *rdp);
+static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
 
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 769e12e..6937eb6 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2375,3 +2375,22 @@ static void rcu_kick_nohz_cpu(int cpu)
smp_send_reschedule(cpu);
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
+
+
+#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
+
+/*
+ * Initialize dynticks sysidle state for CPUs coming online.
+ */
+static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp)
+{
+   rdtp->dynticks_idle_nesting = DYNTICK_TASK_NEST_VALUE;
+}
+
+#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
+
+static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp)
+{
+}
+
+#endif

[PATCH RFC nohz_full 5/7] nohz_full: Add full-system-idle arguments to API

From: "Paul E. McKenney" 

This commit adds an isidle and jiffies argument to force_qs_rnp(),
dyntick_save_progress_counter(), and rcu_implicit_dynticks_qs() to enable
RCU's force-quiescent-state process to check for full-system idle.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/rcutree.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index c1f7cf8..725524e 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -231,7 +231,9 @@ module_param(jiffies_till_next_fqs, ulong, 0644);
 
 static void rcu_start_gp_advanced(struct rcu_state *rsp, struct rcu_node *rnp,
  struct rcu_data *rdp);
-static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *));
+static void force_qs_rnp(struct rcu_state *rsp,
+int (*f)(struct rcu_data *, bool *, unsigned long *),
+bool *isidle, unsigned long *maxj);
 static void force_quiescent_state(struct rcu_state *rsp);
 static int rcu_pending(int cpu);
 
@@ -712,7 +714,8 @@ static int rcu_is_cpu_rrupt_from_idle(void)
  * credit them with an implicit quiescent state.  Return 1 if this CPU
  * is in dynticks idle mode, which is an extended quiescent state.
  */
-static int dyntick_save_progress_counter(struct rcu_data *rdp)
+static int dyntick_save_progress_counter(struct rcu_data *rdp,
+bool *isidle, unsigned long *maxj)
 {
rdp->dynticks_snap = atomic_add_return(0, >dynticks->dynticks);
return (rdp->dynticks_snap & 0x1) == 0;
@@ -724,7 +727,8 @@ static int dyntick_save_progress_counter(struct rcu_data 
*rdp)
  * idle state since the last call to dyntick_save_progress_counter()
  * for this same CPU, or by virtue of having been offline.
  */
-static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
+static int rcu_implicit_dynticks_qs(struct rcu_data *rdp,
+   bool *isidle, unsigned long *maxj)
 {
unsigned int curr;
unsigned int snap;
@@ -1345,16 +1349,19 @@ static int rcu_gp_init(struct rcu_state *rsp)
 int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
 {
int fqs_state = fqs_state_in;
+   bool isidle = 0;
+   unsigned long maxj;
struct rcu_node *rnp = rcu_get_root(rsp);
 
rsp->n_force_qs++;
if (fqs_state == RCU_SAVE_DYNTICK) {
/* Collect dyntick-idle snapshots. */
-   force_qs_rnp(rsp, dyntick_save_progress_counter);
+   force_qs_rnp(rsp, dyntick_save_progress_counter,
+, );
fqs_state = RCU_FORCE_QS;
} else {
/* Handle dyntick-idle and offline CPUs. */
-   force_qs_rnp(rsp, rcu_implicit_dynticks_qs);
+   force_qs_rnp(rsp, rcu_implicit_dynticks_qs, , );
}
/* Clear flag to prevent immediate re-entry. */
if (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) {
@@ -2055,7 +2062,9 @@ void rcu_check_callbacks(int cpu, int user)
  *
  * The caller must have suppressed start of new grace periods.
  */
-static void force_qs_rnp(struct rcu_state *rsp, int (*f)(struct rcu_data *))
+static void force_qs_rnp(struct rcu_state *rsp,
+int (*f)(struct rcu_data *, bool *, unsigned long *),
+bool *isidle, unsigned long *maxj)
 {
unsigned long bit;
int cpu;
@@ -2079,7 +2088,7 @@ static void force_qs_rnp(struct rcu_state *rsp, int 
(*f)(struct rcu_data *))
bit = 1;
for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
if ((rnp->qsmask & bit) != 0 &&
-   f(per_cpu_ptr(rsp->rda, cpu)))
+   f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
mask |= bit;
}
if (mask != 0) {
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine

From: "Paul E. McKenney" 

This commit adds the state machine that takes the per-CPU idle data
as input and produces a full-system-idle indication as output.  This
state machine is driven out of RCU's quiescent-state-forcing
mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
idle state and then rcu_sysidle_report() to drive the state machine.

The full-system-idle state is sampled using rcu_sys_is_idle(), which
also drives the state machine if RCU is idle (and does so by forcing
RCU to become non-idle).  This function returns true if all but the
timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
enough to avoid memory contention on the full_sysidle_state state
variable.  The rcu_sysidle_force_exit() may be called externally
to reset the state machine back into non-idle state.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 include/linux/rcupdate.h |  18 +++
 kernel/rcutree.c |  16 ++-
 kernel/rcutree.h |   5 +
 kernel/rcutree_plugin.h  | 284 ++-
 4 files changed, 316 insertions(+), 7 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 48f1ef9..1aa8d8c 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1011,4 +1011,22 @@ static inline bool rcu_is_nocb_cpu(int cpu) { return 
false; }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
 
+/* Only for use by adaptive-ticks code. */
+#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
+extern bool rcu_sys_is_idle(void);
+extern void rcu_sysidle_force_exit(void);
+#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
+
+static inline bool rcu_sys_is_idle(void)
+{
+   return false;
+}
+
+static inline void rcu_sysidle_force_exit(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
+
+
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 725524e..aa6d96e 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -718,6 +718,7 @@ static int dyntick_save_progress_counter(struct rcu_data 
*rdp,
 bool *isidle, unsigned long *maxj)
 {
rdp->dynticks_snap = atomic_add_return(0, >dynticks->dynticks);
+   rcu_sysidle_check_cpu(rdp, isidle, maxj);
return (rdp->dynticks_snap & 0x1) == 0;
 }
 
@@ -1356,11 +1357,17 @@ int rcu_gp_fqs(struct rcu_state *rsp, int fqs_state_in)
rsp->n_force_qs++;
if (fqs_state == RCU_SAVE_DYNTICK) {
/* Collect dyntick-idle snapshots. */
+   if (is_sysidle_rcu_state(rsp)) {
+   isidle = 1;
+   maxj = jiffies - ULONG_MAX / 4;
+   }
force_qs_rnp(rsp, dyntick_save_progress_counter,
 , );
+   rcu_sysidle_report_gp(rsp, isidle, maxj);
fqs_state = RCU_FORCE_QS;
} else {
/* Handle dyntick-idle and offline CPUs. */
+   isidle = 0;
force_qs_rnp(rsp, rcu_implicit_dynticks_qs, , );
}
/* Clear flag to prevent immediate re-entry. */
@@ -2087,9 +2094,12 @@ static void force_qs_rnp(struct rcu_state *rsp,
cpu = rnp->grplo;
bit = 1;
for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
-   if ((rnp->qsmask & bit) != 0 &&
-   f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
-   mask |= bit;
+   if ((rnp->qsmask & bit) != 0) {
+   if ((rnp->qsmaskinit & bit) != 0)
+   *isidle = 0;
+   if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
+   mask |= bit;
+   }
}
if (mask != 0) {
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 1895043..e0de5dc 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -555,6 +555,11 @@ static void rcu_kick_nohz_cpu(int cpu);
 static bool init_nocb_callback_list(struct rcu_data *rdp);
 static void rcu_sysidle_enter(struct rcu_dynticks *rdtp, int irq);
 static void rcu_sysidle_exit(struct rcu_dynticks *rdtp, int irq);
+static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
+ unsigned long *maxj);
+static bool is_sysidle_rcu_state(struct rcu_state *rsp);
+static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
+ unsigned long maxj);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 3edae39..ff84bed 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -28,7 +28,7 @@
 #include 
 #include 
 #include 
-#include 
+#include "time/tick-internal.h"
 
 #define RCU_KTHREAD_PRIO 1
 
@@ -2395,12 +2395,12 @@ static void

[PATCH RFC nohz_full 1/7] nohz_full: Add Kconfig parameter for scalable detection of all-idle state

From: "Paul E. McKenney" 

At least one CPU must keep the scheduling-clock tick running for
timekeeping purposes whenever there is a non-idle CPU.  However, with
the new nohz_full adaptive-idle machinery, it is difficult to distinguish
between all CPUs really being idle as opposed to all non-idle CPUs being
in adaptive-ticks mode.  This commit therefore adds a Kconfig parameter
as a first step towards enabling a scalable detection of full-system
idle state.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/time/Kconfig | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 70f27e8..a613c2a 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -134,6 +134,29 @@ config NO_HZ_FULL_ALL
 Note the boot CPU will still be kept outside the range to
 handle the timekeeping duty.
 
+config NO_HZ_FULL_SYSIDLE
+   bool "Detect full-system idle state for full dynticks system"
+   depends on NO_HZ_FULL
+   default n
+   help
+At least one CPU must keep the scheduling-clock tick running
+for timekeeping purposes whenever there is a non-idle CPU,
+where "non-idle" includes CPUs with a single runnable task
+in adaptive-idle mode.  Because the underlying adaptive-tick
+support cannot distinguish between all CPUs being idle and
+all CPUs each running a single task in adaptive-idle mode,
+the underlying support simply ensures that there is always
+a CPU handling the scheduling-clock tick, whether or not all
+CPUs are idle.  This Kconfig option enables scalable detection
+of the all-CPUs-idle state, thus allowing the scheduling-clock
+tick to be disabled when all CPUs are idle.  Note that scalable
+detection of the all-CPUs-idle state means that larger systems
+will be slower to declare the all-CPUs-idle state.
+
+Say Y if you would like to help debug all-CPUs-idle detection.
+
+Say N if you are unsure.
+
 config NO_HZ
bool "Old Idle dynticks config"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC nohz_full 7/7] nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU

From: "Paul E. McKenney" 

Because RCU's quiescent-state-forcing mechanism is used to drive the
full-system-idle state machine, and because this mechanism is executed
by RCU's grace-period kthreads, this commit forces these kthreads to
run on the timekeeping CPU (tick_do_timer_cpu).  To do otherwise would
mean that the RCU grace-period kthreads would force the system into
non-idle state every time they drove the state machine, which would
be just a bit on the futile side.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/rcutree.c|  1 +
 kernel/rcutree.h|  1 +
 kernel/rcutree_plugin.h | 20 +++-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index aa6d96e..fe83085 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1286,6 +1286,7 @@ static int rcu_gp_init(struct rcu_state *rsp)
struct rcu_data *rdp;
struct rcu_node *rnp = rcu_get_root(rsp);
 
+   rcu_bind_gp_kthread();
raw_spin_lock_irq(>lock);
rsp->gp_flags = 0; /* Clear all flags: New grace period. */
 
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index e0de5dc..49dac99 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -560,6 +560,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
 static bool is_sysidle_rcu_state(struct rcu_state *rsp);
 static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
  unsigned long maxj);
+static void rcu_bind_gp_kthread(void);
 static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index ff84bed..f65d9c2 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2544,7 +2544,7 @@ static void rcu_sysidle_check_cpu(struct rcu_data *rdp, 
bool *isidle,
if (!*isidle || rdp->rsp != rcu_sysidle_state ||
cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
return;
-   /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
+   WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu);
 
/* Pick up current idle and NMI-nesting counter and check. */
cur = atomic_read(>dynticks_idle);
@@ -2570,6 +2570,20 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
 }
 
 /*
+ * Bind the grace-period kthread for the sysidle flavor of RCU to the
+ * timekeeping CPU.
+ */
+static void rcu_bind_gp_kthread(void)
+{
+   int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+
+   if (cpu < 0 || cpu >= nr_cpu_ids)
+   return;
+   if (raw_smp_processor_id() != cpu)
+   set_cpus_allowed_ptr(current, cpumask_of(cpu));
+}
+
+/*
  * Return a delay in jiffies based on the number of CPUs, rcu_node
  * leaf fanout, and jiffies tick rate.  The idea is to allow larger
  * systems more time to transition to full-idle state in order to
@@ -2767,6 +2781,10 @@ static bool is_sysidle_rcu_state(struct rcu_state *rsp)
return false;
 }
 
+static void rcu_bind_gp_kthread(void)
+{
+}
+
 static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
  unsigned long maxj)
 {
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC nohz_full 4/7] nohz_full: Add full-system idle states and variables

From: "Paul E. McKenney" 

This commit adds control variables and states for full-system idle.
The system will progress through the states in numerical order when
the system is fully idle (other than the timekeeping CPU), and reset
down to the initial state if any non-timekeeping CPU goes non-idle.
The current state is kept in full_sysidle_state.

A RCU_SYSIDLE_SMALL macro is defined, and systems with this number
of CPUs or fewer move through the states more aggressively.  The idea
is that the resulting memory contention is less of a problem on small
systems.  Architectures can adjust this value (which defaults to 8)
using CONFIG_ARCH_RCU_SYSIDLE_SMALL.

One flavor of RCU will be in charge of driving the state machine,
defined by rcu_sysidle_state.  This should be the busiest flavor of RCU.

Signed-off-by: Paul E. McKenney 
Cc: Frederic Weisbecker 
Cc: Steven Rostedt 
---
 kernel/rcutree_plugin.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 814ff47..3edae39 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2380,6 +2380,34 @@ static void rcu_kick_nohz_cpu(int cpu)
 #ifdef CONFIG_NO_HZ_FULL_SYSIDLE
 
 /*
+ * Handle small systems specially, accelerating their transition into
+ * full idle state.  Allow arches to override this code's idea of
+ * what constitutes a "small" system.
+ */
+#ifdef CONFIG_ARCH_RCU_SYSIDLE_SMALL
+#define RCU_SYSIDLE_SMALL CONFIG_ARCH_RCU_SYSIDLE_SMALL
+#else /* #ifdef CONFIG_ARCH_RCU_SYSIDLE_SMALL */
+#define RCU_SYSIDLE_SMALL 8
+#endif
+
+/*
+ * Define RCU flavor that holds sysidle state.  This needs to be the
+ * most active flavor of RCU.
+ */
+#ifdef CONFIG_PREEMPT_RCU
+static struct rcu_state __maybe_unused *rcu_sysidle_state = _preempt_state;
+#else /* #ifdef CONFIG_PREEMPT_RCU */
+static struct rcu_state __maybe_unused *rcu_sysidle_state = _sched_state;
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
+static int __maybe_unused full_sysidle_state; /* Current system-idle state. */
+#define RCU_SYSIDLE_NOT0   /* Some CPU is not idle. */
+#define RCU_SYSIDLE_SHORT  1   /* All CPUs idle for brief period. */
+#define RCU_SYSIDLE_LONG   2   /* All CPUs idle for long enough. */
+#define RCU_SYSIDLE_FULL   3   /* All CPUs idle, ready for sysidle. */
+#define RCU_SYSIDLE_FULL_NOTED 4   /* Actually entered sysidle state. */
+
+/*
  * Invoked to note exit from irq or task transition to idle.  Note that
  * usermode execution does -not- count as idle here!  After all, we want
  * to detect full-system idle states, not RCU quiescent states and grace
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC nohz_full 0/7] v4 Provide infrastructure for full-system idle

Whenever there is at least one non-idle CPU, it is necessary to
periodically update timekeeping information. Before NO_HZ_FULL, this
updating was carried out by the scheduling-clock tick, which ran on
every non-idle CPU. With the advent of NO_HZ_FULL, it is possible
to have non-idle CPUs that are not receiving scheduling-clock ticks.
This possibility is handled by assigning a timekeeping CPU that continues
taking scheduling-clock ticks.

Unfortunately, timekeeping CPU continues taking scheduling-clock
interrupts even when all other CPUs are completely idle, which is
not so good for energy efficiency and battery lifetime. Clearly, it
would be good to turn off the timekeeping CPU's scheduling-clock tick
when all CPUs are completely idle. This is conceptually simple, but
we also need good performance and scalability on large systems, which
rules out implementations based on frequently updated global counts of
non-idle CPUs as well as implementations that frequently scan all CPUs.
Nevertheless, we need a single global indicator in order to keep the
overhead of checking acceptably low.

The chosen approach is to enforce hysteresis on the non-idle to
full-system-idle transition, with the amount of hysteresis increasing
linearly with the number of CPUs, thus keeping contention acceptably low.
This approach piggybacks on RCU's existing force-quiescent-state scanning
of idle CPUs, which has the advantage of avoiding the scan entirely on
busy systems that have high levels of multiprogramming. This scan
takes per-CPU idleness information and feeds it into a state machine
that applies the level of hysteresis required to arrive at a single
full-system-idle indicator.

The individual patches are as follows:

1. Add a CONFIG_NO_HZ_FULL_SYSIDLE Kconfig parameter to enable
this feature. Kernels built with CONFIG_NO_HZ_FULL_SYSIDLE=n
act exactly as they do today.

2. Add new fields to the rcu_dynticks structure that track CPU-idle
information. These fields consider CPUs running usermode to be
non-idle, in contrast with the existing fields in that structure.

3. Track per-CPU idle states.

4. Add full-system idle states and state variables.

5. Expand force_qs_rnp(), dyntick_save_progress_counter(), and
rcu_implicit_dynticks_qs() APIs to enable passing full-system
idle state information.

6. Add full-system-idle state machine.

7. Force RCU's grace-period kthreads onto the timekeeping CPU.

Changes since v3 (https://lkml.org/lkml/2013/7/8/441):

o Fix an embarrassing bug that allowed multiple kthreads to be
executing the state machine concurrently.

Changes since v2 (https://lkml.org/lkml/2013/6/28/610):

o Completed removing NMI support (thanks to Frederic for spotting
the remaining cruft).

o Fix a state-machine bug, again spotted by Frederic. See

http://lists-archives.com/linux-kernel/27865835-nohz_full-add-full-system-idle-state-machine.html
for the full details of the bug.

o Updated commit log and comment as suggested by Josh Triplett.

Changes since v1 (https://lkml.org/lkml/2013/6/25/664):

o Removed NMI support because NMI handlers cannot safely read
the time anyway (thanks to Thomas Gleixner and Peter Zijlstra).

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH v14 6/6] LSM: Multiple LSM Documentation and cleanup

2013-07-26 Thread Randy Dunlap

On 07/25/13 11:32, Casey Schaufler wrote:
> Subject: [PATCH v14 6/6] LSM: Multiple LSM Documentation and cleanup
> 
> Add documentation and remove the obsolete capability LSM.
> Clean up some comments in security.h
> 
> Signed-off-by: Casey Schaufler 
> 
> ---
>  Documentation/security/LSM.txt |   56 +-
>  include/linux/security.h   |   48 +-
>  security/Makefile  |1 -
>  security/capability.c  | 1106 
> 
>  4 files changed, 77 insertions(+), 1134 deletions(-)
> 
> diff --git a/Documentation/security/LSM.txt b/Documentation/security/LSM.txt
> index c335a76..69cf466 100644
> --- a/Documentation/security/LSM.txt
> +++ b/Documentation/security/LSM.txt
> @@ -7,20 +7,56 @@ various security checks to be hooked by new kernel 
> extensions. The name
>  loadable kernel modules. Instead, they are selectable at build-time via
>  CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the
>  "security=..." kernel command line argument, in the case where multiple
> -LSMs were built into a given kernel.
> +LSMs were built into a given kernel. The names of the active LSMs
> +can be read from /sys/kernel/security/lsm.
> +
> +Both CONFIG_DEFAULT_SECURITY and the "security=" option take a comma
> +separated list of LSM names. The LSM hooks are invoked in the order
> +specified. All hooks provided are invoked regardless of the outcome
> +of preceding hooks. Hooks that return success or failure results
> +return success if all of the LSM provided hooks succeed and the error
> +code of the last failing hook on error.
> +
> +Information from an LSM can come in one of two forms. The raw data
> +used by the LSM is typically the preferred form. SELinux contexts and
> +Smack labels are examples of raw LSM data. If the data from multiple
> +LSMs is presented together it will be in the form:
> +
> + lsmname='value'[lsmname='value']...

no commas? just (made up example):

smack='label'selinux='notstrict'

> +
> +Interfaces that accept LSM data as input accept this format as well,
> +passing only the relevant portion of the data to each LSM.
> +
> +The /proc filesystem attribute interface supports files from a time
> +when only one LSM could be used at a time. CONFIG_PRESENT_SECURITY
> +defines which LSM uses these interfaces. The name of this LSM can be
> +read from /sys/kernel/security/present. There are also LSM identified
> +interfaces which should be used in preference to the undifferentiated
> +interfaces. The attribute interface "context" always provides the
> +data from all LSMs that maintain it in the lsmname='value' format.
> +
> +The three networking mechanisms supporting LSM attributes are
> +restricted to providing those attributes for a single LSM each.
> +CONFIG_SECMARK_LSM specifies which LSM will provide hooks for the
> +secmark mechanism. CONFIG_NETLABEL_LSM specifies which LSM hooks
> +are used by NetLabel to provide IPv4 CIPSO headers. CONFIG_XFRM_LSM
> +specifies the LSM providing xfrm hooks. CONFIG_PEERSEC_LSM allows
> +for either a specific LSM to provide data with SO_PEERSEC or for
> +all LSMs that provide data to do so.
> +
> +The Linux capabilities system is used in conjunction with any LSMs.
> +LSM hooks are called after the capability checks in most cases,
^
> +but after in a small number of cases. All LSM hooks need to be aware
   ^

   one of these 'after's should be 'before' ??

> +of the potential interactions with the capability system. For more
> +details on capabilities, see capabilities(7) in the Linux man-pages
> +project.
>  
>  The primary users of the LSM interface are Mandatory Access Control
>  (MAC) extensions which provide a comprehensive security policy. Examples
>  include SELinux, Smack, Tomoyo, and AppArmor. In addition to the larger
> -MAC extensions, other extensions can be built using the LSM to provide
> -specific changes to system operation when these tweaks are not available
> -in the core functionality of Linux itself.
> -
> -Without a specific LSM built into the kernel, the default LSM will be the
> -Linux capabilities system. Most LSMs choose to extend the capabilities
> -system, building their checks on top of the defined capability hooks.
> -For more details on capabilities, see capabilities(7) in the Linux
> -man-pages project.
> +MAC extensions, other extensions such as Yama can be built using the LSM
> +to provide specific changes to system operation when these tweaks are not
> +available in the core functionality of Linux itself.
>  
>  Based on http://kerneltrap.org/Linux/Documenting_Security_Module_Intent,
>  a new LSM is accepted into the kernel when its intent (a description of


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: replace strict_strtol() with kstrtol()

2013-07-26 Thread GregKH

On Tue, Jul 23, 2013 at 12:00:44AM +0400, Evgeniy Polyakov wrote:
> Hi everyone
> 
> 19.07.2013, 11:16, "Jingoo Han" :
> > The usage of strict_strtol() is not preferred, because
> > strict_strtol() is obsolete. Thus, kstrtol() should be
> > used.
> 
> Looks good to me, although I do not really see the difference
> Greg, please pull into your tree or suggest appropriate.
> 
> Acked-by: Evgeniy Polyakov 

Can someone resend this, I don't seem to be able to find it...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/18] MAINTAINERS: ARM: plat-nomadik: Update patterns

2013-07-26 Thread Linus Walleij

On Mon, Jul 22, 2013 at 2:15 AM, Joe Perches  wrote:

> commit 694e33a7f4 ("ARM: plat-nomadik: move MTU, kill plat-nomadik")
> moved the files, update the patterns.
>
> Signed-off-by: Joe Perches 
> cc: Linus Walleij 

Reviewed-by: Linus Walleij 

Sorry for missing this :-/

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ti-st: fix NULL dereference on protocol type check

On Thu, Jul 25, 2013 at 07:16:28PM +0100, Gustavo Padovan wrote:
> * Andrew Morton  [2013-07-24 16:12:22 -0700]:
> 
> > On Tue, 23 Jul 2013 15:29:31 +0100 Gustavo Padovan  
> > wrote:
> > 
> > > From: Gustavo Padovan 
> > > 
> > > If the type we receive is greater than ST_MAX_CHANNELS we can't rely on
> > > type as vector index since we would be accessing unknown memory when we 
> > > use the type
> > > as index.
> > > 
> > >  Unable to handle kernel NULL pointer dereference at virtual address 
> > > 001b
> > >  pgd = c0004000
> > >  [001b] *pgd=
> > >  Internal error: Oops: 17 [#1] PREEMPT SMP ARM
> > >  Modules linked in: btwilink wl12xx wlcore mac80211 cfg80211 rfcomm bnep 
> > > bluo
> > >  CPU: 0Tainted: GW (3.4.0+ #15)
> > >  PC is at st_int_recv+0x278/0x344
> > >  LR is at get_parent_ip+0x14/0x30
> > >  pc : []lr : []psr: 200f0193
> > >  sp : dc631ed0  ip : e3e21c24  fp : dc631f04
> > >  r10:   r9 : 600f0113  r8 : 003f
> > >  r7 : e3e21b14  r6 : 0067  r5 : e2e49c1c  r4 : e3e21a80
> > >  r3 : 0001  r2 : 0001  r1 : 0001  r0 : 600f0113
> > >  Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> > >  Control: 10c5387d  Table: 9c50004a  DAC: 0015
> > > 
> > > Signed-off-by: Gustavo Padovan 
> > > ---
> > >  drivers/misc/ti-st/st_core.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/misc/ti-st/st_core.c b/drivers/misc/ti-st/st_core.c
> > > index 0a14280..8e64eb1 100644
> > > --- a/drivers/misc/ti-st/st_core.c
> > > +++ b/drivers/misc/ti-st/st_core.c
> > > @@ -343,7 +343,7 @@ void st_int_recv(void *disc_data,
> > >   /* Unknow packet? */
> > >   default:
> > >   type = *ptr;
> > > - if (st_gdata->list[type] == NULL) {
> > > + if (type >= ST_MAX_CHANNELS || st_gdata->list[type] == 
> > > NULL) {
> > >   pr_err("chip/interface misbehavior dropping"
> > >   " frame starting with 0x%02x", type);
> > >   goto done;
> > 
> > This would be a bug in the calling code, would it not?
> 
> It is possible and it seems 39f610e40 could be a fix for this. I would need to
> test. I was testing it on old kernel without this patch. In any case my patch
> is still needed.

Why?  Shouldn't you just prevent this from ever happening in the first
place?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/3] mm: improve page aging fairness between zones/nodes

2013-07-26 Thread Johannes Weiner

On Fri, Jul 26, 2013 at 03:45:33PM -0700, Andrew Morton wrote:
> On Fri, 19 Jul 2013 16:55:22 -0400 Johannes Weiner  wrote:
> 
> > The way the page allocator interacts with kswapd creates aging
> > imbalances, where the amount of time a userspace page gets in memory
> > under reclaim pressure is dependent on which zone, which node the
> > allocator took the page frame from.
> > 
> > #1 fixes missed kswapd wakeups on NUMA systems, which lead to some
> >nodes falling behind for a full reclaim cycle relative to the other
> >nodes in the system
> > 
> > #3 fixes an interaction where kswapd and a continuous stream of page
> >allocations keep the preferred zone of a task between the high and
> >low watermark (allocations succeed + kswapd does not go to sleep)
> >indefinitely, completely underutilizing the lower zones and
> >thrashing on the preferred zone
> > 
> > These patches are the aging fairness part of the thrash-detection
> > based file LRU balancing.  Andrea recommended to submit them
> > separately as they are bugfixes in their own right.
> > 
> > The following test ran a foreground workload (memcachetest) with
> > background IO of various sizes on a 4 node 8G system (similar results
> > were observed with single-node 4G systems):
> > 
> > parallelio
> >BAS
> > FAIRALLO
> >   BASE   
> > FAIRALLOC
> > Ops memcachetest-0M  5170.00 (  0.00%)   5283.00 (  
> > 2.19%)
> > Ops memcachetest-791M4740.00 (  0.00%)   5293.00 ( 
> > 11.67%)
> > Ops memcachetest-2639M   2551.00 (  0.00%)   4950.00 ( 
> > 94.04%)
> > Ops memcachetest-4487M   2606.00 (  0.00%)   3922.00 ( 
> > 50.50%)
> > Ops io-duration-0M  0.00 (  0.00%)  0.00 (  
> > 0.00%)
> > Ops io-duration-791M   55.00 (  0.00%) 18.00 ( 
> > 67.27%)
> > Ops io-duration-2639M 235.00 (  0.00%)103.00 ( 
> > 56.17%)
> > Ops io-duration-4487M 278.00 (  0.00%)173.00 ( 
> > 37.77%)
> > Ops swaptotal-0M0.00 (  0.00%)  0.00 (  
> > 0.00%)
> > Ops swaptotal-791M 245184.00 (  0.00%)  0.00 (  
> > 0.00%)
> > Ops swaptotal-2639M468069.00 (  0.00%) 108778.00 ( 
> > 76.76%)
> > Ops swaptotal-4487M452529.00 (  0.00%)  76623.00 ( 
> > 83.07%)
> > Ops swapin-0M   0.00 (  0.00%)  0.00 (  
> > 0.00%)
> > Ops swapin-791M108297.00 (  0.00%)  0.00 (  
> > 0.00%)
> > Ops swapin-2639M   169537.00 (  0.00%)  50031.00 ( 
> > 70.49%)
> > Ops swapin-4487M   167435.00 (  0.00%)  34178.00 ( 
> > 79.59%)
> > Ops minorfaults-0M1518666.00 (  0.00%)1503993.00 (  
> > 0.97%)
> > Ops minorfaults-791M  1676963.00 (  0.00%)1520115.00 (  
> > 9.35%)
> > Ops minorfaults-2639M 1606035.00 (  0.00%)1799717.00 
> > (-12.06%)
> > Ops minorfaults-4487M 1612118.00 (  0.00%)1583825.00 (  
> > 1.76%)
> > Ops majorfaults-0M  6.00 (  0.00%)  0.00 (  
> > 0.00%)
> > Ops majorfaults-791M13836.00 (  0.00%) 10.00 ( 
> > 99.93%)
> > Ops majorfaults-2639M   22307.00 (  0.00%)   6490.00 ( 
> > 70.91%)
> > Ops majorfaults-4487M   21631.00 (  0.00%)   4380.00 ( 
> > 79.75%)
> 
> A reminder whether positive numbers are good or bad would be useful ;)

It depends on the datapoint, but a positive percentage number is an
improvement, a negative one a regression.

> >  BASFAIRALLO
> > BASE   FAIRALLOC
> > User  287.78  460.97
> > System   2151.67 3142.51
> > Elapsed  9737.00 8879.34
> 
> Confused.  Why would the amount of user time increase so much?
> 
> And that's a tremendous increase in system time.  Am I interpreting
> this correctly?

It is because each memcachetest is running for a fixed duration (only
the background IO is fixed in size).  The time memcachetest previously
spent waiting on major faults is now spent doing actual work (more
user time, more syscalls).  The number of operations memcachetest
could actually perform increased.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] ALSA: Added jack detection kcontrol support

2013-07-26 Thread Felipe Tonello

Mark,

On Fri, Jul 26, 2013 at 3:48 PM, Mark Brown  wrote:
> On Fri, Jul 26, 2013 at 12:10:27PM -0700, Felipe Tonello wrote:
>> On Fri, Jul 26, 2013 at 11:54 AM, Mark Brown  wrote:
>
>> > This isn't ideal for multi-function jacks like headsets - it will report
>> > a single boolean value for the jack regardless of what's plugged in
>> > meaning userpace can't do things like figure out if a headset or
>> > headphone is present.  It's probably OK for any realistic input button
>> > since you're not going to get an input button without other things being
>> > present.
>
>> The KControl for Jack is boolean anyway. You can check it with "amixer
>> contents". user-space can figure out based on the control name. At
>> least PulseAudio does that way.
>
> No, it can't do that for headset jacks - these will be created with a
> single jack reporting multiple states, there's a state for headphone and
> a state for microphone.  The system can generally distinguish between
> having a headset or just plain headphones inserted and act accordingly
> (for example, recording from the built in microphone on a phone when
> used with normal headpones).
>
>> > What I'd expect to happen here is that for multi function jacks we
>> > create a control per function if the controls are valid.

Ok, so the idea is just to change the control to type integer instead
of boolean, right?
Because as you say, the user will be able to check the type of jack
based on the status value, right?

>
>> Do you mean based on snd_jack_types?
>
> Yes.  If there's only one function supported the current code is fine
> but for multiple functions it's going to discard useful information.

So, what do you suggest to do that? I'm not sure if I understand what
you are saying.
When you mean function, do you mean the SND_JACK_BTN_n or the the jack
types, such as SND_JACK_HEADPHONE, and so on?

If a codec creates a jack type SND_JACK_HEADSET (= SND_JACK_HEADPHONE
| SND_JACK_MICROPHONE). It should be created two controls, name +
"Headphone Jack" and name + "Microphone Jack"? If so, what about the
status to report? How to know which control to report?

Felipe Tonello
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] pci_ids, 8250_pci: remove PCI_VENDOR_ID_ADDIDATA_OLD

On Fri, Jul 19, 2013 at 03:37:26PM -0600, Bjorn Helgaas wrote:
> On Tue, Jul 16, 2013 at 9:14 AM, Ian Abbott  wrote:
> > The 8250_pci driver uses PCI_VENDOR_ID_ADDIDATA_OLD (0x10e8),
> > PCI_DEVICE_ID_ADDIDATA_APCI7800 (0x818e) to recognize the original
> > ADDI-DATA APCI-7800 PCI serial card.  However vendor ID 0x10e8 was
> > assigned by PCI-SIG to Applied Micro Circuits Corporation (AMCC) and the
> > associated device ID 0x818e was assigned by AMCC to ADDI-DATA.
> >
> > Comedi already defines PCI_VENDOR_ID_AMCC as 0x10e8 in one of its header
> > files, so that definition can be moved into pci_ids.h and the 8250_pci
> > driver changed to use it.  The PCI_DEVICE_ID_ADDIDATA_APCI7800 define
> > seems out of place in pci_ids.h since it isn't associated with
> > ADDI-DATA's vendor ID but with AMCC's vendor ID.  It's only used in
> > 8250_pci.c so it can be moved there and renamed to something more
> > sensible.
> >
> > 1) pci_ids.h: move PCI_VENDOR_ID_AMCC here
> > 2) serial: 8250_pci: replace PCI_VENDOR_ID_ADDIDATA_OLD
> > 3) serial: 8250_pci: use local device ID for ADDI-DATA APCI-7800
> > 4) pci_ids.h: remove PCI_VENDOR_ID_ADDIDATA_OLD and
> >PCI_DEVICE_ID_ADDIDATA_APCI7800
> >
> >  drivers/staging/comedi/comedidev.h | 1 -
> >  drivers/tty/serial/8250/8250_pci.c | 9 +
> >  include/linux/pci_ids.h| 4 ++--
> >  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> For patches 1 & 4 (the ones that touch pci_ids.h):
> 
> Acked-by: Bjorn Helgaas 
> 
> Please merge them along with the 8250 changes.

Thanks, will do.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [microblaze-linux] [RESEND PATCH] microblaze: Fix clone syscall

2013-07-26 Thread Andrew Morton

On Wed, 24 Jul 2013 08:48:27 +0200 Michal Simek  wrote:

> On 07/24/2013 07:55 AM, Rich Felker wrote:
> > On Wed, Jul 24, 2013 at 07:34:07AM +0200, Michal Simek wrote:
> >> Microblaze was assign to CLONE_BACKWARDS type where
> >> parent tid was passed via 3rd argument.
> >> Microblaze glibc is using 4th argument for it.
> >>
> >> Create new CLONE_BACKWARDS3 type where stack_size is passed
> >> via 3rd argument, parent thread id pointer via 4th,
> >> child thread id pointer via 5th and tls value as 6th
> >> argument
> > 
> > I believe this also affects us in musl. What is the motivation for
> > making a configure option that results in there being two incompatible
> > syscall ABIs for the same arch?
> > This sounds like a really bad idea...
> 
> This patch fixes bug which was introduced by Al's patch where he moved
> clone implementation from microblaze folder to generic location.

That's important information which was omitted from the changelog. 

Please identify the patch which casused this regression (SHA hash and
title), thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] serial: omap: fix wrong context restoration on init

On Fri, Jul 12, 2013 at 03:11:46PM +0300, Felipe Balbi wrote:
> hi,
> 
> On Fri, Jul 12, 2013 at 02:55:42PM +0300, Grygorii Strashko wrote:
> > Since commit a630fbf "serial: omap: Fix device tree based PM runtime"
> > the OMAP serial driver will always try to restore its context in
> > serial_omap_runtime_resume(). But the problem is that during driver
> > initialization the UART context is not ready yet and, as result, first
> > call to pm_runtime_get*() will cause UART register overwriting by all
> > zeros. This causes Kernel boot hang in case if "earlyprintk" feature is
> > enabled at least [1].
> > 
> > Unfortunately, there is no exact place in driver now where we can
> > determine that UART context is ready - most of registers configured in
> > serial_omap_set_termios(), but some of them in other places.
> > More over, even if PM runtime will be disabled (blocked) during OMAP
> > serial driver probe() execution [2],[3] it will fix only console UART,
> > but context of other UARTs will be overwriting by all zeros during first
> > access to the corresponding UART.
> > 
> > To fix this issue:
> > - introduce additional "initialized" flag and update PM runtime callback
> > to do nothing if its not set. Set "initialized" at the end of probe().
> > - read current UART registers configuration in probe and use it by
> > default.
> > 
> > [1] http://www.spinics.net/lists/arm-kernel/msg256828.html
> > [2] http://www.spinics.net/lists/arm-kernel/msg258062.html
> > [3] http://www.spinics.net/lists/arm-kernel/msg258040.html
> > 
> > CC: Tony Lindgren 
> > CC: Rajendra Nayak 
> > CC: Felipe Balbi 
> > CC: Kevin Hilman 
> > 
> > Signed-off-by: Grygorii Strashko 
> > ---
> > tested on OMAP4 SDP with and without earlyprintk enabled.
> >  drivers/tty/serial/omap-serial.c |   27 ++-
> >  1 file changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/tty/serial/omap-serial.c 
> > b/drivers/tty/serial/omap-serial.c
> > index f39bf0c..e1e9667 100644
> > --- a/drivers/tty/serial/omap-serial.c
> > +++ b/drivers/tty/serial/omap-serial.c
> > @@ -162,6 +162,7 @@ struct uart_omap_port {
> > struct work_struct  qos_work;
> > struct pinctrl  *pins;
> > boolis_suspending;
> > +   boolinitialized;
> 
> you really think adding this sort of bool flag is the best thing we can
> do ? Something which will, quite likely, spread through every single
> driver ?

I agree, that's not ok, please fix this up "properly" somehow.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the staging tree

2013-07-26 Thread Eli Billauer


On 27/07/13 00:56, Greg KH wrote:

No, I need you to do that.  Can you do a kernel build with:
make M=drivers/staging/xillybus C=1
and fix up the errors that sparse reports and send a patch for that?

   
I'm not sure it's related to me. I get the same errors whether I compile 
my own modules or something in e.g. drivers/tty/ . This is what I get 
after make allmodconfig on the current staging git repo:


$ make M=drivers/staging/xillybus C=1
/home/eli/xillybus/submission/staging/arch/x86/Makefile:107: 
CONFIG_X86_X32 enabled but no binutils support

  CHECK   drivers/staging/xillybus/xillybus_core.c
/home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13: 
error: Expected ( after asm
/home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13: 
error: got goto

  CC [M]  drivers/staging/xillybus/xillybus_core.o
  CHECK   drivers/staging/xillybus/xillybus_pcie.c
/home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13: 
error: Expected ( after asm
/home/eli/xillybus/submission/staging/arch/x86/include/asm/jump_label.h:16:13: 
error: got goto

  CC [M]  drivers/staging/xillybus/xillybus_pcie.o

I'll spare you the output from modules in drivers/tty. But it's exactly 
the same messages on each of these modules.


Am I doing something wrong?

Regards,
   Eli

P.S. Regarding the missing Reported-By header, I learned something new 
today. Thanks. :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/3] mm: vmscan: fix numa reclaim balance problem in kswapd

2013-07-26 Thread Andrew Morton

On Fri, 19 Jul 2013 16:55:23 -0400 Johannes Weiner  wrote:

> When the page allocator fails to get a page from all zones in its
> given zonelist, it wakes up the per-node kswapds for all zones that
> are at their low watermark.
> 
> However, with a system under load and the free page counters being
> per-cpu approximations, the observed counter value in a zone can
> fluctuate enough that the allocation fails but the kswapd wakeup is
> also skipped while the zone is still really close to the low
> watermark.
> 
> When one node misses a wakeup like this, it won't be aged before all
> the other node's zones are down to their low watermarks again.  And
> skipping a full aging cycle is an obvious fairness problem.
> 
> Kswapd runs until the high watermarks are restored, so it should also
> be woken when the high watermarks are not met.  This ages nodes more
> equally and creates a safety margin for the page counter fluctuation.

Well yes, but what guarantee is there that the per-cpu counter error
problem is reliably fixed?  AFAICT this patch "fixes" it because the
gap between the low and high watermarks happens to be larger than the
per-cpu counter fluctuation, yes?  If so, there are surely all sorts of
situations where it will break again.

To fix this reliably, we should be looking at constraining counter
batch sizes or performing a counter summation to get the more accurate
estimate?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC nohz_full 6/7] nohz_full: Add full-system-idle state machine

On Thu, Jul 25, 2013 at 01:26:44AM +0200, Frederic Weisbecker wrote:
> On Wed, Jul 24, 2013 at 03:09:02PM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 24, 2013 at 08:09:04PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Jul 18, 2013 at 10:06:25PM -0700, Paul E. McKenney wrote:
> > > > > Lets summarize the last sequence, the following happens ordered by 
> > > > > time:
> > > > > 
> > > > > CPU 0  CPU 1
> > > > > 
> > > > >  cmpxchg(_sysidle_state,
> > > > >  RCU_SYSIDLE_SHORT,
> > > > >  RCU_SYSIDLE_LONG);
> > > > > 
> > > > >  smp_mb() //cmpxchg
> > > > > 
> > > > >  atomic_read(rdtp(1)->dynticks_idle)
> > > > > 
> > > > >  //CPU 0 goes to sleep
> > > > >//CPU 1 wakes up
> > > > >
> > > > > atomic_inc(rdtp(1)->dynticks_idle)
> > > > > 
> > > > >smp_mb()
> > > > > 
> > > > >ACCESS_ONCE(full_sysidle_state)
> > > > > 
> > > > > 
> > > > > Are you suggesting that because the CPU 1 executes its atomic_inc() 
> > > > > _after_ (in terms
> > > > > of absolute time) the atomic_read of CPU 0, the ordering settled in 
> > > > > both sides guarantees
> > > > > that the value read from CPU 1 is the one from the cmpxchg that 
> > > > > precedes the atomic_read,
> > > > > or FULL or FULL_NOTED that happen later.
> > > > > 
> > > > > If so that's a big lesson for me. 
> > > > 
> > > > It is not absolute time that matters.  Instead, it is the fact that
> > > > CPU 0, when reading from ->dynticks_idle, read the old value before the
> > > > atomic_inc().  Therefore, anything CPU 0 did before that memory barrier
> > > > preceding CPU 0's read must come before anything CPU 1 did after that
> > > > memory barrier following the atomic_inc().  For this to work, there
> > > > must be some access to the same variable on each CPU.
> > > 
> > > Aren't we in the following situation?
> > > 
> > > CPU 0  CPU 1
> > > 
> > > STORE ASTORE B
> > > LOAD B LOAD A
> > > 
> > > 
> > > If so and referring to your perfbook, this is an "ears to mouth" 
> > > situation.
> > > And it seems to describe there is no strong guarantee in that situation.
> > 
> > "Yes" to the first, but on modern hardware, "no" to the second.  The key
> > paragraph is Section 12.2.4.5:
> > 
> > The following pairings from Table 12.1 can be used on modern
> > hardware, but might fail on some systems that were produced in
> > the 1990s. However, these can safely be used on all mainstream
> > hardware introduced since the year 2000.
> 
> Right I missed that!

Nor are you alone.  ;-)

> > That said, you are not the first to be confused by this, so I might need
> > to rework this section to make it clear that each can in fact be used on
> > modern hardware.
> > 
> > If you happen to have an old Sequent NUMA-Q or Symmetry box lying around,
> > things are a bit different.  On the other hand, I don't believe that any
> > of these old boxes are still running Linux.  (Hey, I am as sentimental as
> > the next guy, but there are limits!)
> > 
> > I updated this section and pushed it, please let me know if this helps!
> 
> I don't know because I encountered some troubles to build it, I'm seeing 
> thousand
> lines like this:
> 
> Name "main::opt_help" used only once: possible typo at /usr/bin/a2ping line 
> 534.
> /usr/bin/a2ping: not a GS output from gs -dSAFER
> ./cartoons/whippersnapper300.eps -> ./cartoons/whippersnapper300.pdf
> Name "main::opt_extra" used only once: possible typo at /usr/bin/a2ping line 
> 546.
> Name "main::opt_help" used only once: possible typo at /usr/bin/a2ping line 
> 534.
> /usr/bin/a2ping: not a GS output from gs -dSAFER
> make: *** [embedfonts] Error 1

Strange.  My version of a2ping is Ubuntu 12.04's:

a2ping.pl 2.77p, 2006-11-15 -- Written by  from April 2003.

You have something more recent?

> Anyway I looked at the diff and it looks indeed clearer, thanks!

Glad it helped!

> So back to the issue, I think we made nice progresses with my rusty brain ;-)
> But just to be clear, I'm pasting that again for just a few precisions:
> 
> CPU 0CPU 1
> 
>cmpxchg(_sysidle_state,  //CPU 1 wakes up
> RCU_SYSIDLE_SHORT,   
> atomic_inc(rdtp(1)->dynticks_idle)
> RCU_SYSIDLE_LONG);
> 
>smp_mb() //cmpxchgsmp_mb()
>atomic_read(rdtp(1)->dynticks_idle)   ACCESS_ONCE(full_sysidle_state
>   //CPU 0 goes to sleep
> 
> 
> 
> 1) If CPU 0 sets RCU_SYSIDLE_LONG and sees dynticks_idle as even, do we have 
> the _guarantee_
> that later CPU 1 sees full_sysidle_state == RCU_SYSIDLE_LONG (or any later 
> full_sysidle_state value)
> due to the connection between atomic_read /

Re: [PATCH 4/4] ALSA: oxygen: Updating jack implementation according new ALSA Jacks

2013-07-26 Thread Felipe Tonello

Mark,

On Fri, Jul 26, 2013 at 3:45 PM, Mark Brown  wrote:
> On Fri, Jul 26, 2013 at 12:02:51PM -0700, Felipe Tonello wrote:
>> On Fri, Jul 26, 2013 at 11:56 AM, Mark Brown  wrote:
>
>> >>   snd_jack_new(chip->card, "Headphone",
>> >> -  SND_JACK_HEADPHONE, >hp_jack);
>> >> +  SND_JACK_HEADPHONE, 0, >hp_jack);
>> >>   xonar_ds_handle_hp_jack(chip);
>
>> > ...this really ought to be done as part of the commit that adds the
>> > parameter since it breaks the build until this patch is applied.
>
>> But that's why is a patch series. But as you say, are you suggesting
>> me to propose this changes in one patch only?
>
> This one should be squashed in, as should the part of the ASoC change
> that adjusts for the call into the core API.  The general idea with a
> patch series is to split things into smaller chunks so they're easier to
> understand and review but still keep things working with each change so
> that things like git bisect continue to be usable.

Sure.

>
> So I guess something like one patch that changes the core jack API to
> add the index and support jack creation, one to remove the HDA custom
> implementation and one to add support for specifying the index to the
> ASoC API and adjust all its users.

I squashed into the HDA one. Perhaps I should send it again?

I don't know why, but my gmail is messing with the thread. If you
think it's better I can send a v3 with no --in-reply-to option.

Felipe Tonello
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] ALSA: Added jack detection kcontrol support

2013-07-26 Thread Mark Brown

On Fri, Jul 26, 2013 at 12:10:27PM -0700, Felipe Tonello wrote:
> On Fri, Jul 26, 2013 at 11:54 AM, Mark Brown  wrote:

> > This isn't ideal for multi-function jacks like headsets - it will report
> > a single boolean value for the jack regardless of what's plugged in
> > meaning userpace can't do things like figure out if a headset or
> > headphone is present.  It's probably OK for any realistic input button
> > since you're not going to get an input button without other things being
> > present.

> The KControl for Jack is boolean anyway. You can check it with "amixer
> contents". user-space can figure out based on the control name. At
> least PulseAudio does that way.

No, it can't do that for headset jacks - these will be created with a
single jack reporting multiple states, there's a state for headphone and
a state for microphone.  The system can generally distinguish between
having a headset or just plain headphones inserted and act accordingly
(for example, recording from the built in microphone on a phone when
used with normal headpones).

> > What I'd expect to happen here is that for multi function jacks we
> > create a control per function if the controls are valid.

> Do you mean based on snd_jack_types?

Yes.  If there's only one function supported the current code is fine
but for multiple functions it's going to discard useful information.

signature.asc
Description: Digital signature

[PATCH v2 1/3] ALSA: Added jack detection kcontrol support

From: "Felipe F. Tonello" 

This patch adds jack support for alsa kcontrol.

This support is necessary since the new kcontrol is used by user-space
daemons, such as PulseAudio(>=2.0), to do jack detection.)

Signed-off-by: Felipe F. Tonello 
---
 include/sound/jack.h |  6 --
 sound/core/Kconfig   |  1 +
 sound/core/ctljack.c |  3 ++-
 sound/core/jack.c| 29 +++--
 4 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/include/sound/jack.h b/include/sound/jack.h
index 5891657..dc62b74 100644
--- a/include/sound/jack.h
+++ b/include/sound/jack.h
@@ -26,6 +26,7 @@
 #include 
 
 struct input_dev;
+struct snd_kcontrol;
 
 /**
  * Jack types which can be reported.  These values are used as a
@@ -58,6 +59,7 @@ enum snd_jack_types {
 
 struct snd_jack {
struct input_dev *input_dev;
+   struct snd_kcontrol *kctl;
int registered;
int type;
const char *id;
@@ -70,7 +72,7 @@ struct snd_jack {
 #ifdef CONFIG_SND_JACK
 
 int snd_jack_new(struct snd_card *card, const char *id, int type,
-struct snd_jack **jack);
+ int idx, struct snd_jack **jack);
 void snd_jack_set_parent(struct snd_jack *jack, struct device *parent);
 int snd_jack_set_key(struct snd_jack *jack, enum snd_jack_types type,
 int keytype);
@@ -80,7 +82,7 @@ void snd_jack_report(struct snd_jack *jack, int status);
 #else
 
 static inline int snd_jack_new(struct snd_card *card, const char *id, int type,
-  struct snd_jack **jack)
+   int idx, struct snd_jack **jack)
 {
return 0;
 }
diff --git a/sound/core/Kconfig b/sound/core/Kconfig
index c0c2f57..8167615 100644
--- a/sound/core/Kconfig
+++ b/sound/core/Kconfig
@@ -20,6 +20,7 @@ config SND_COMPRESS_OFFLOAD
 # to avoid having to force INPUT on.
 config SND_JACK
bool
+   select SND_KCTL_JACK
 
 config SND_SEQUENCER
tristate "Sequencer support"
diff --git a/sound/core/ctljack.c b/sound/core/ctljack.c
index e4b38fb..59aa6d0 100644
--- a/sound/core/ctljack.c
+++ b/sound/core/ctljack.c
@@ -38,7 +38,8 @@ snd_kctl_jack_new(const char *name, int idx, void 
*private_data)
kctl = snd_ctl_new1(_detect_kctl, private_data);
if (!kctl)
return NULL;
-   snprintf(kctl->id.name, sizeof(kctl->id.name), "%s Jack", name);
+
+   strlcpy(kctl->id.name, name, sizeof(kctl->id.name));
kctl->id.index = idx;
kctl->private_value = 0;
return kctl;
diff --git a/sound/core/jack.c b/sound/core/jack.c
index b35fe73..b2757b1 100644
--- a/sound/core/jack.c
+++ b/sound/core/jack.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int jack_switch_types[SND_JACK_SWITCH_TYPES] = {
SW_HEADPHONE_INSERT,
@@ -48,6 +49,7 @@ static int snd_jack_dev_free(struct snd_device *device)
else
input_free_device(jack->input_dev);
 
+   snd_ctl_remove(device->card, jack->kctl);
kfree(jack->id);
kfree(jack);
 
@@ -85,26 +87,36 @@ static int snd_jack_dev_register(struct snd_device *device)
if (err == 0)
jack->registered = 1;
 
+   /* We don't need to free the control, it's freed by snd_ctl_add itself
+  if an error occur */
+   err = snd_ctl_add(card, jack->kctl);
+
return err;
 }
 
 /**
  * snd_jack_new - Create a new jack
  * @card:  the card instance
- * @id:an identifying string for this jack
+ * @id:an identifying string for this jack, " Jack" is appended to the
+ * string
  * @type:  a bitmask of enum snd_jack_type values that can be detected by
  * this jack
+ * @idx:   index of this control item
  * @jjack: Used to provide the allocated jack object to the caller.
  *
  * Creates a new jack object.
  *
+ * This function creates a Jack Kcontrol, which is exported to user space via
+ * ALSA Controls.
+ *
  * Return: Zero if successful, or a negative error code on failure.
  * On success @jjack will be initialised.
  */
 int snd_jack_new(struct snd_card *card, const char *id, int type,
-struct snd_jack **jjack)
+ int idx, struct snd_jack **jjack)
 {
struct snd_jack *jack;
+   struct snd_kcontrol *kctl;
int err;
int i;
static struct snd_device_ops ops = {
@@ -117,6 +129,7 @@ int snd_jack_new(struct snd_card *card, const char *id, int 
type,
return -ENOMEM;
 
jack->id = kstrdup(id, GFP_KERNEL);
+   sprintf((char *)jack->id, "%s Jack", jack->id);
 
jack->input_dev = input_allocate_device();
if (jack->input_dev == NULL) {
@@ -137,6 +150,15 @@ int snd_jack_new(struct snd_card *card, const char *id, 
int type,
if (err < 0)
goto fail_input;
 
+   /* card is the private_data */
+   kctl = snd_kctl_jack_new(jack->id, idx, card);
+   if (!kctl) {
+   err = -ENOMEM;
+   goto fail_input;
+

[PATCH v2 2/3] ALSA: pci: HDA/oxygen: Updating jack implementation according new ALSA Jacks

From: "Felipe F. Tonello" 

ALSA standard jacks already are implemented using ALSA KControl. So there is
no need implement that itself or to use snd_jack for input events only.

Also updating oxygen codec jack implementation to support new jack API.

Signed-off-by: Felipe F. Tonello 
---
 sound/pci/hda/Kconfig   |  8 
 sound/pci/hda/hda_codec.h   |  2 --
 sound/pci/hda/hda_jack.c| 38 +-
 sound/pci/hda/hda_jack.h|  4 +---
 sound/pci/oxygen/xonar_wm87x6.c |  2 +-
 5 files changed, 19 insertions(+), 35 deletions(-)

diff --git a/sound/pci/hda/Kconfig b/sound/pci/hda/Kconfig
index 59c5e9c..561abc7 100644
--- a/sound/pci/hda/Kconfig
+++ b/sound/pci/hda/Kconfig
@@ -65,14 +65,6 @@ config SND_HDA_INPUT_BEEP_MODE
  Set 1 to always enable the digital beep interface for HD-audio by
  default.
 
-config SND_HDA_INPUT_JACK
-   bool "Support jack plugging notification via input layer"
-   depends on INPUT=y || INPUT=SND
-   select SND_JACK
-   help
- Say Y here to enable the jack plugging notification via
- input layer.
-
 config SND_HDA_PATCH_LOADER
bool "Support initialization patch loading for HD-audio"
select FW_LOADER
diff --git a/sound/pci/hda/hda_codec.h b/sound/pci/hda/hda_codec.h
index 701c2e0..ca7be59 100644
--- a/sound/pci/hda/hda_codec.h
+++ b/sound/pci/hda/hda_codec.h
@@ -912,10 +912,8 @@ struct hda_codec {
unsigned long jackpoll_interval; /* In jiffies. Zero means no poll, 
rely on unsol events */
struct delayed_work jackpoll_work;
 
-#ifdef CONFIG_SND_HDA_INPUT_JACK
/* jack detection */
struct snd_array jacks;
-#endif
 
/* fix-up list */
int fixup_id;
diff --git a/sound/pci/hda/hda_jack.c b/sound/pci/hda/hda_jack.c
index 3fd2973..6be1a0c 100644
--- a/sound/pci/hda/hda_jack.c
+++ b/sound/pci/hda/hda_jack.c
@@ -112,7 +112,6 @@ EXPORT_SYMBOL_HDA(snd_hda_jack_tbl_new);
 
 void snd_hda_jack_tbl_clear(struct hda_codec *codec)
 {
-#ifdef CONFIG_SND_HDA_INPUT_JACK
/* free jack instances manually when clearing/reconfiguring */
if (!codec->bus->shutdown && codec->jacktbl.list) {
struct hda_jack_tbl *jack = codec->jacktbl.list;
@@ -122,7 +121,6 @@ void snd_hda_jack_tbl_clear(struct hda_codec *codec)
snd_device_free(codec->bus->card, jack->jack);
}
}
-#endif
snd_array_free(>jacktbl);
 }
 
@@ -283,17 +281,15 @@ void snd_hda_jack_report_sync(struct hda_codec *codec)
if (!jack->kctl)
continue;
state = get_jack_plug_state(jack->pin_sense);
-   snd_kctl_jack_report(codec->bus->card, jack->kctl, 
state);
-#ifdef CONFIG_SND_HDA_INPUT_JACK
-   if (jack->jack)
+   if (jack->phantom_jack)
+   snd_kctl_jack_report(codec->bus->card, 
jack->kctl, state);
+   else if (jack->jack)
snd_jack_report(jack->jack,
state ? jack->type : 0);
-#endif
}
 }
 EXPORT_SYMBOL_HDA(snd_hda_jack_report_sync);
 
-#ifdef CONFIG_SND_HDA_INPUT_JACK
 /* guess the jack type from the pin-config */
 static int get_input_jack_type(struct hda_codec *codec, hda_nid_t nid)
 {
@@ -320,7 +316,6 @@ static void hda_free_jack_priv(struct snd_jack *jack)
jacks->nid = 0;
jacks->jack = NULL;
 }
-#endif
 
 /**
  * snd_hda_jack_add_kctl - Add a kctl for the given pin
@@ -340,29 +335,30 @@ static int __snd_hda_jack_add_kctl(struct hda_codec 
*codec, hda_nid_t nid,
return 0;
if (jack->kctl)
return 0; /* already created */
-   kctl = snd_kctl_jack_new(name, idx, codec);
-   if (!kctl)
-   return -ENOMEM;
-   err = snd_hda_ctl_add(codec, nid, kctl);
-   if (err < 0)
-   return err;
-   jack->kctl = kctl;
+
jack->phantom_jack = !!phantom_jack;
 
-   state = snd_hda_jack_detect(codec, nid);
-   snd_kctl_jack_report(codec->bus->card, kctl, state);
-#ifdef CONFIG_SND_HDA_INPUT_JACK
-   if (!phantom_jack) {
+   /* If it's phantom jack only creates kcontrol jack elem */
+   if (jack->phantom_jack) {
+   kctl = snd_kctl_jack_new(name, idx, codec);
+   if (!kctl)
+   return -ENOMEM;
+   err = snd_hda_ctl_add(codec, nid, kctl);
+   if (err < 0)
+   return err;
+   jack->kctl = kctl;
+   } else {
+   state = snd_hda_jack_detect(codec, nid);
jack->type = get_input_jack_type(codec, nid);
err = snd_jack_new(codec->bus->card, name, jack->type,
-  >jack);
+  idx, >jack);
if (err < 0)

[PATCH v2 3/3] ALSA: SoC: Updating jack implementation according new ALSA Jacks

From: "Felipe F. Tonello" 

Updating the ASoC jack support to add the ability to specify a
jack index when creating it.

Signed-off-by: Felipe F. Tonello 
---
 include/sound/soc.h|  2 +-
 sound/soc/fsl/wm1133-ev1.c |  4 ++--
 sound/soc/mid-x86/mfld_machine.c   |  6 +++---
 sound/soc/omap/ams-delta.c |  2 +-
 sound/soc/omap/omap-abe-twl6040.c  |  4 ++--
 sound/soc/omap/omap-twl4030.c  |  4 ++--
 sound/soc/omap/rx51.c  |  6 +++---
 sound/soc/pxa/hx4700.c |  4 ++--
 sound/soc/pxa/palm27x.c|  4 ++--
 sound/soc/pxa/ttc-dkb.c|  8 
 sound/soc/pxa/z2.c |  4 ++--
 sound/soc/samsung/goni_wm8994.c|  4 ++--
 sound/soc/samsung/h1940_uda1380.c  |  4 ++--
 sound/soc/samsung/littlemill.c | 10 +-
 sound/soc/samsung/lowland.c|  6 +++---
 sound/soc/samsung/rx1950_uda1380.c |  4 ++--
 sound/soc/samsung/smartq_wm8987.c  |  4 ++--
 sound/soc/samsung/speyside.c   |  6 +++---
 sound/soc/samsung/tobermory.c  |  4 ++--
 sound/soc/soc-jack.c   |  4 ++--
 sound/soc/tegra/tegra_alc5632.c|  4 ++--
 sound/soc/tegra/tegra_rt5640.c |  2 +-
 sound/soc/tegra/tegra_wm8903.c |  8 
 23 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/include/sound/soc.h b/include/sound/soc.h
index 6eabee7..31bea52 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -436,7 +436,7 @@ int snd_soc_platform_trigger(struct snd_pcm_substream 
*substream,
 
 /* Jack reporting */
 int snd_soc_jack_new(struct snd_soc_codec *codec, const char *id, int type,
-struct snd_soc_jack *jack);
+ int idx, struct snd_soc_jack *jack);
 void snd_soc_jack_report(struct snd_soc_jack *jack, int status, int mask);
 int snd_soc_jack_add_pins(struct snd_soc_jack *jack, int count,
  struct snd_soc_jack_pin *pins);
diff --git a/sound/soc/fsl/wm1133-ev1.c b/sound/soc/fsl/wm1133-ev1.c
index fce6325..50f96d4 100644
--- a/sound/soc/fsl/wm1133-ev1.c
+++ b/sound/soc/fsl/wm1133-ev1.c
@@ -221,14 +221,14 @@ static int wm1133_ev1_init(struct snd_soc_pcm_runtime 
*rtd)
ARRAY_SIZE(wm1133_ev1_map));
 
/* Headphone jack detection */
-   snd_soc_jack_new(codec, "Headphone", SND_JACK_HEADPHONE, _jack);
+   snd_soc_jack_new(codec, "Headphone", SND_JACK_HEADPHONE, 0, _jack);
snd_soc_jack_add_pins(_jack, ARRAY_SIZE(hp_jack_pins),
  hp_jack_pins);
wm8350_hp_jack_detect(codec, WM8350_JDR, _jack, SND_JACK_HEADPHONE);
 
/* Microphone jack detection */
snd_soc_jack_new(codec, "Microphone",
-SND_JACK_MICROPHONE | SND_JACK_BTN_0, _jack);
+SND_JACK_MICROPHONE | SND_JACK_BTN_0, 0, _jack);
snd_soc_jack_add_pins(_jack, ARRAY_SIZE(mic_jack_pins),
  mic_jack_pins);
wm8350_mic_jack_detect(codec, _jack, SND_JACK_MICROPHONE,
diff --git a/sound/soc/mid-x86/mfld_machine.c b/sound/soc/mid-x86/mfld_machine.c
index ee36384..e2c7978 100644
--- a/sound/soc/mid-x86/mfld_machine.c
+++ b/sound/soc/mid-x86/mfld_machine.c
@@ -253,9 +253,9 @@ static int mfld_init(struct snd_soc_pcm_runtime *runtime)
snd_soc_dapm_disable_pin(dapm, "LINEINR");
 
/* Headset and button jack detection */
-   ret_val = snd_soc_jack_new(codec, "Intel(R) MID Audio Jack",
-   SND_JACK_HEADSET | SND_JACK_BTN_0 |
-   SND_JACK_BTN_1, _jack);
+   ret_val = snd_soc_jack_new(codec, "Intel(R) MID Audio",
+  SND_JACK_HEADSET | SND_JACK_BTN_0 |
+  SND_JACK_BTN_1, 0, _jack);
if (ret_val) {
pr_err("jack creation failed\n");
return ret_val;
diff --git a/sound/soc/omap/ams-delta.c b/sound/soc/omap/ams-delta.c
index 6294464..4ffa38e 100644
--- a/sound/soc/omap/ams-delta.c
+++ b/sound/soc/omap/ams-delta.c
@@ -491,7 +491,7 @@ static int ams_delta_cx20442_init(struct 
snd_soc_pcm_runtime *rtd)
/* Add hook switch - can be used to control the codec from userspace
 * even if line discipline fails */
ret = snd_soc_jack_new(rtd->codec, "hook_switch",
-   SND_JACK_HEADSET, _delta_hook_switch);
+  SND_JACK_HEADSET, 0, _delta_hook_switch);
if (ret)
dev_warn(card->dev,
"Failed to allocate resources for hook switch, "
diff --git a/sound/soc/omap/omap-abe-twl6040.c 
b/sound/soc/omap/omap-abe-twl6040.c
index 70cd5c7..45ff3bc 100644
--- a/sound/soc/omap/omap-abe-twl6040.c
+++ b/sound/soc/omap/omap-abe-twl6040.c
@@ -193,8 +193,8 @@ static int omap_abe_twl6040_init(struct snd_soc_pcm_runtime 
*rtd)
 
/* Headset jack detection only if it is supported */
if (priv->jack_detection) {
-   ret = snd_soc_jack_new(codec,

Re: [patch 0/3] mm: improve page aging fairness between zones/nodes

2013-07-26 Thread Andrew Morton

On Fri, 19 Jul 2013 16:55:22 -0400 Johannes Weiner  wrote:

> The way the page allocator interacts with kswapd creates aging
> imbalances, where the amount of time a userspace page gets in memory
> under reclaim pressure is dependent on which zone, which node the
> allocator took the page frame from.
> 
> #1 fixes missed kswapd wakeups on NUMA systems, which lead to some
>nodes falling behind for a full reclaim cycle relative to the other
>nodes in the system
> 
> #3 fixes an interaction where kswapd and a continuous stream of page
>allocations keep the preferred zone of a task between the high and
>low watermark (allocations succeed + kswapd does not go to sleep)
>indefinitely, completely underutilizing the lower zones and
>thrashing on the preferred zone
> 
> These patches are the aging fairness part of the thrash-detection
> based file LRU balancing.  Andrea recommended to submit them
> separately as they are bugfixes in their own right.
> 
> The following test ran a foreground workload (memcachetest) with
> background IO of various sizes on a 4 node 8G system (similar results
> were observed with single-node 4G systems):
> 
> parallelio
>BASFAIRALLO
>   BASE   FAIRALLOC
> Ops memcachetest-0M  5170.00 (  0.00%)   5283.00 (  2.19%)
> Ops memcachetest-791M4740.00 (  0.00%)   5293.00 ( 11.67%)
> Ops memcachetest-2639M   2551.00 (  0.00%)   4950.00 ( 94.04%)
> Ops memcachetest-4487M   2606.00 (  0.00%)   3922.00 ( 50.50%)
> Ops io-duration-0M  0.00 (  0.00%)  0.00 (  0.00%)
> Ops io-duration-791M   55.00 (  0.00%) 18.00 ( 67.27%)
> Ops io-duration-2639M 235.00 (  0.00%)103.00 ( 56.17%)
> Ops io-duration-4487M 278.00 (  0.00%)173.00 ( 37.77%)
> Ops swaptotal-0M0.00 (  0.00%)  0.00 (  0.00%)
> Ops swaptotal-791M 245184.00 (  0.00%)  0.00 (  0.00%)
> Ops swaptotal-2639M468069.00 (  0.00%) 108778.00 ( 76.76%)
> Ops swaptotal-4487M452529.00 (  0.00%)  76623.00 ( 83.07%)
> Ops swapin-0M   0.00 (  0.00%)  0.00 (  0.00%)
> Ops swapin-791M108297.00 (  0.00%)  0.00 (  0.00%)
> Ops swapin-2639M   169537.00 (  0.00%)  50031.00 ( 70.49%)
> Ops swapin-4487M   167435.00 (  0.00%)  34178.00 ( 79.59%)
> Ops minorfaults-0M1518666.00 (  0.00%)1503993.00 (  0.97%)
> Ops minorfaults-791M  1676963.00 (  0.00%)1520115.00 (  9.35%)
> Ops minorfaults-2639M 1606035.00 (  0.00%)1799717.00 (-12.06%)
> Ops minorfaults-4487M 1612118.00 (  0.00%)1583825.00 (  1.76%)
> Ops majorfaults-0M  6.00 (  0.00%)  0.00 (  0.00%)
> Ops majorfaults-791M13836.00 (  0.00%) 10.00 ( 99.93%)
> Ops majorfaults-2639M   22307.00 (  0.00%)   6490.00 ( 70.91%)
> Ops majorfaults-4487M   21631.00 (  0.00%)   4380.00 ( 79.75%)

A reminder whether positive numbers are good or bad would be useful ;)

>  BASFAIRALLO
> BASE   FAIRALLOC
> User  287.78  460.97
> System   2151.67 3142.51
> Elapsed  9737.00 8879.34

Confused.  Why would the amount of user time increase so much?

And that's a tremendous increase in system time.  Am I interpreting
this correctly?
 
>BASFAIRALLO
>   BASE   FAIRALLOC
> Minor Faults  5372192557188551
> Major Faults392195   15157
> Swap Ins   2994854  112770
> Swap Outs  4907092  134982
> Direct pages scanned 0   41824
> Kswapd pages scanned  32975063 8128269
> Kswapd pages reclaimed 6323069 7093495
> Direct pages reclaimed   0   41824
> Kswapd efficiency  19% 87%
> Kswapd velocity   3386.573 915.414
> Direct efficiency 100%100%
> Direct velocity  0.000   4.710
> Percentage direct scans 0%  0%
> Zone normal velocity  2011.338 550.661
> Zone dma32 velocity   1365.623 369.221
> Zone dma velocity9.612   0.242
> Page writes by reclaim18732404.000  614807.000
> Page writes file  13825312  479825
> Page writes anon   4907092  134982
> Page reclaim immediate   854905647
> Sector Reads  12080532  483244
> Sector Writes 8874050865438876
> Page rescued immediate

[PATCH v2 0/3] ALSA: Implement core jack support for kcontrol