Re: 32-bit bug in iovec iterator changes

2014-06-20 Thread Al Viro
On Fri, Jun 20, 2014 at 11:51:44PM -0400, Theodore Ts'o wrote:
> On Fri, Jun 20, 2014 at 08:38:20AM +1000, Dave Chinner wrote:
> > 
> > Short reads are more likely a bug in all the iovec iterator stuff
> > that got merged in from the vfs tree. ISTR a 32 bit-only bug in that
> > stuff go past in to do with not being able to partition a 32GB block
> > dev on a 32 bit system due to a 32 bit size_t overflow somewhere
> 
> Dave Chinner called it.  
> 
> Al, I'm seeing a regression which shows up using a 32-bit x86 kernel.
> The symptoms of the bug is when run under KVM, with a 5 GB /dev/vdc
> virtual block device, a read at offset 2 ** 30 fails with a short
> read:
> 
> # dd if=/dev/vdc of=/dev/null bs=4k skip=262144 count=1
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.0164144 s, 0.0 kB/s

Argh...

ed include/linux/uio.h 

[PATCH][BUGFIX] x86/reboot: Disable scheduler before disabling IO APIC

2014-06-20 Thread Fenghua Yu
From: Fenghua Yu 

During reboot, in the middle of disabling IO APIC, the scheduler may be
triggered by per cpu timer to do load blance. But since the kernel is
already in the process of shutting down and can not execute scheduler's
load balance at this point, it triggers invalid TSS exception and hangs
during reboot.

This happens on some boards (e.g. AsRock ZT87 Extreme4 BIOS 2.70) in 32-bit
kernel reported in Bugzilla 76661 at
https://bugzilla.kernel.org/show_bug.cgi?id=76661

To fix the issue, we disable local irq including per cpu timer before
disabling IO APIC. By doing this, the scheduler will not disturb
disable_IO_APIC().

Signed-off-by: Fenghua Yu 
Tested-by: berndku...@hotmail.com
---
 arch/x86/kernel/reboot.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 52b1157..16111c6 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -574,6 +574,15 @@ static void native_machine_emergency_restart(void)
 void native_machine_shutdown(void)
 {
/* Stop the cpus and apics */
+
+#ifdef CONFIG_SMP
+   /*
+* Disable the local irq to not receive the per-cpu timer interrupt
+* which may trigger scheduler's load balance.
+*/
+   local_irq_disable();
+#endif
+
 #ifdef CONFIG_X86_IO_APIC
/*
 * Disabling IO APIC before local APIC is a workaround for
@@ -591,11 +600,8 @@ void native_machine_shutdown(void)
 
 #ifdef CONFIG_SMP
/*
-* Stop all of the others. Also disable the local irq to
-* not receive the per-cpu timer interrupt which may trigger
-* scheduler's load balance.
+* Stop all of the others.
 */
-   local_irq_disable();
stop_other_cpus();
 #endif
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dma: imx-sdma: Add a new DMATYPE for Shared Peripheral ASRC

2014-06-20 Thread Shawn Guo
On Mon, Jun 16, 2014 at 11:31:05AM +0800, Nicolin Chen wrote:
> Shared Peripheral ASRC, running on SPBA, needs to use shp sciprts for
> DMA transfer. So this patch just adds a new DMATYPE for it.
> 
> Signed-off-by: Nicolin Chen 

Acked-by: Shawn Guo 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU

2014-06-20 Thread Josh Triplett
On Fri, Jun 20, 2014 at 07:59:58PM -0700, Paul E. McKenney wrote:
> Commit ac1bea85781e (Make cond_resched() report RCU quiescent states)
> fixed a problem where a CPU looping in the kernel with but one runnable
> task would give RCU CPU stall warnings, even if the in-kernel loop
> contained cond_resched() calls.  Unfortunately, in so doing, it introduced
> performance regressions in Anton Blanchard's will-it-scale "open1" test.
> The problem appears to be not so much the increased cond_resched() path
> length as an increase in the rate at which grace periods complete, which
> increased per-update grace-period overhead.
> 
> This commit takes a different approach to fixing this bug, mainly by
> moving the RCU-visible quiescent state from cond_resched() to
> rcu_note_context_switch(), and by further reducing the check to a
> simple non-zero test of a single per-CPU variable.  However, this
> approach requires that the force-quiescent-state processing send
> resched IPIs to the offending CPUs.  These will be sent only once
> the grace period has reached an age specified by the boot/sysfs
> parameter rcutree.jiffies_till_sched_qs, or once the grace period
> reaches an age halfway to the point at which RCU CPU stall warnings
> will be emitted, whichever comes first.
> 
> Reported-by: Dave Hansen 
> Signed-off-by: Paul E. McKenney 
> Cc: Josh Triplett 
> Cc: Andi Kleen 
> Cc: Christoph Lameter 
> Cc: Mike Galbraith 
> Cc: Eric Dumazet 

I like this approach *far* better.  This is the kind of thing I had in
mind when I suggested using the fqs machinery: remove the poll entirely
and just thwack a CPU if it takes too long without a quiescent state.
Reviewed-by: Josh Triplett 

> ---
> 
>  b/Documentation/kernel-parameters.txt |6 +
>  b/include/linux/rcupdate.h|   36 
>  b/kernel/rcu/tree.c   |  140 
> +++---
>  b/kernel/rcu/tree.h   |6 +
>  b/kernel/rcu/tree_plugin.h|2 
>  b/kernel/rcu/update.c |   18 
>  b/kernel/sched/core.c |7 -
>  7 files changed, 125 insertions(+), 90 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 6eaa9cdb7094..910c3829f81d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>   leaf rcu_node structure.  Useful for very large
>   systems.
>  
> + rcutree.jiffies_till_sched_qs= [KNL]
> + Set required age in jiffies for a
> + given grace period before RCU starts
> + soliciting quiescent-state help from
> + rcu_note_context_switch().
> +
>   rcutree.jiffies_till_first_fqs= [KNL]
>   Set delay from grace-period initialization to
>   first attempt to force quiescent states.
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 5a75d19aa661..243aa4656cb7 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -44,7 +44,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  
>  extern int rcu_expedited; /* for sysctl */
> @@ -300,41 +299,6 @@ bool __rcu_is_watching(void);
>  #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) 
> || defined(CONFIG_SMP) */
>  
>  /*
> - * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
> - */
> -
> -#define RCU_COND_RESCHED_LIM 256 /* ms vs. 100s of ms. */
> -DECLARE_PER_CPU(int, rcu_cond_resched_count);
> -void rcu_resched(void);
> -
> -/*
> - * Is it time to report RCU quiescent states?
> - *
> - * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
> - * increment some random CPU's count, and possibly also load the result from
> - * yet another CPU's count.  We might even clobber some other CPU's attempt
> - * to zero its counter.  This is all OK because the goal is not precision,
> - * but rather reasonable amortization of rcu_note_context_switch() overhead
> - * and extremely high probability of avoiding RCU CPU stall warnings.
> - * Note that this function has to be preempted in just the wrong place,
> - * many thousands of times in a row, for anything bad to happen.
> - */
> -static inline bool rcu_should_resched(void)
> -{
> - return raw_cpu_inc_return(rcu_cond_resched_count) >=
> -RCU_COND_RESCHED_LIM;
> -}
> -
> -/*
> - * Report quiscent states to RCU if it is time to do so.
> - */
> -static inline void rcu_cond_resched(void)
> -{
> - if (unlikely(rcu_should_resched()))
> - rcu_resched();
> -}
> -
> -/*
>   * Infrastructure to implement the synchronize_() primitives in
>   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
>   */
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> 

Re: [PATCH] regulator: palmas: Fix SMPS enable/disable/is_enabled

2014-06-20 Thread Alexandre Courbot
On Sat, Jun 21, 2014 at 2:26 AM, Nishanth Menon  wrote:
> We use regmap regulator ops to enable/disable and check if regulator
> is enabled for various SMPS. However, these depend on valid
> enable_reg, enable_mask and enable_value in regulator descriptor.
>
> Currently we do not populate these for SMPS other than SMPS10, this
> results in spurious results as regmap assumes that the values are
> valid and ends up reading register 0x0 RTC:SECONDS_REG on Palmas
> variants that do have RTC! To fix this, we update proper parameters
> for the descriptor fields.
>
> Further, we want to ensure the behavior consistent with logic
> prior to commit dbabd624d4eec50b6, where, once you do a set_mode,
> enable/disable ensure the logic remains consistent and configures
> Palmas to the configuration that we set with set_mode (since the
> configuration register is common). To do this, we can rely on the
> regulator core's regulator_register behavior where the regulator
> descriptor pointer provided by the regulator driver is stored. (no
> reallocation and copy is done). This lets us update the enable_value
> post registration, to remain consistent with the mode we configure as
> part of set_mode.

Tested-by: Alexandre Courbot 

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] direct-io: squelch maybe-uninitialized warning in do_direct_IO()

2014-06-20 Thread Jason Cooper
The following warnings:

  fs/direct-io.c: In function ‘__blockdev_direct_IO’:
  fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
  fs/direct-io.c:913:16: note: ‘to’ was declared here
  fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
  fs/direct-io.c:913:10: note: ‘from’ was declared here

are not necessary because dio_get_page() either fails, or sets both
'from' and 'to'.

Make the compiler happy so we can more easily detect legitimate
warnings.

Signed-off-by: Jason Cooper 
---
 fs/direct-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 98040ba388ac..c0a9854d2bc7 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -910,7 +910,8 @@ static int do_direct_IO(struct dio *dio, struct dio_submit 
*sdio,
 
while (sdio->block_in_file < sdio->final_block_in_request) {
struct page *page;
-   size_t from, to;
+   size_t from = 0;
+   size_t to = 0;
page = dio_get_page(dio, sdio, , );
if (IS_ERR(page)) {
ret = PTR_ERR(page);
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[v3.10-rt / v3.12-rt] scheduling while atomic in cgroup code

2014-06-20 Thread Nikita Yushchenko

Hi.

Call Trace:
[e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable)
[e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0
[e22d5ae0] [c060b9ec] __schedule+0x530/0x550
[e22d5bf0] [c060bacc] schedule+0x30/0xbc
[e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c
[e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4
[e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98
[e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac
[e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70
[e22d5d90] [c0117284] __do_fault+0x38c/0x510
[e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858
[e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc
[e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80

What happens:

- refill_stock() calls get_cpu_var() and thus disables preemption until
matching put_cpu_var() is called,

- then it calls drain_stock() -> res_counter_uncharge() ->
res_counter_uncharge_until()

- and here we have spin_lock(), which under RT can sleep. Thus we have 
sleeping with preemption disabled.



Any ideas how to fix?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


32-bit bug in iovec iterator changes

2014-06-20 Thread Theodore Ts'o
On Fri, Jun 20, 2014 at 08:38:20AM +1000, Dave Chinner wrote:
> 
> Short reads are more likely a bug in all the iovec iterator stuff
> that got merged in from the vfs tree. ISTR a 32 bit-only bug in that
> stuff go past in to do with not being able to partition a 32GB block
> dev on a 32 bit system due to a 32 bit size_t overflow somewhere

Dave Chinner called it.  

Al, I'm seeing a regression which shows up using a 32-bit x86 kernel.
The symptoms of the bug is when run under KVM, with a 5 GB /dev/vdc
virtual block device, a read at offset 2 ** 30 fails with a short
read:

# dd if=/dev/vdc of=/dev/null bs=4k skip=262144 count=1
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0164144 s, 0.0 kB/s

On a 3.15 kernel, this command works:

# dd if=/dev/vdc of=/dev/null bs=4k skip=262144 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0457984 s, 89.4 kB/s

I tried bisecting it, but unfortunately the iovec iterator changes are
not cleanly bisectable, since copy_page_from_iter() gets introduced
some two dozen patches before it gets defined.  :-(

However, the bisect leads quite squarely to to the iovec iterator
patches.

Al, I'd appreciate it if you could take a look?

Thanks!!

- Ted


% git bisect start
# good: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
git bisect good 1860e379875dfe7271c649058aeddffe5afd9d0d
# bad: [7171511eaec5bf23fb06078f59784a3a0626b38f] Linux 3.16-rc1
git bisect bad 7171511eaec5bf23fb06078f59784a3a0626b38f
# good: [aaeb2554337217dfa4eac2fcc90da7be540b9a73] Merge branch 'v4l_for_linus' 
of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media into next
git bisect good aaeb2554337217dfa4eac2fcc90da7be540b9a73
# bad: [16b9057804c02e2d351e9c8f606e909b43cbd9e7] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 16b9057804c02e2d351e9c8f606e909b43cbd9e7
# good: [82abb273d838318424644d8f02825db0fbbd400a] Merge branch 'upstream' of 
git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect good 82abb273d838318424644d8f02825db0fbbd400a
# good: [d1e1cda862c16252087374ac75949b0e89a5717e] Merge tag 'nfs-for-3.16-1' 
of git://git.linux-nfs.org/projects/trondmy/linux-nfs
git bisect good d1e1cda862c16252087374ac75949b0e89a5717e
# good: [23d4ed53b7342bf5999b3ea227d9f69e75e5a625] Merge branch 'for-linus' of 
git://git.kernel.dk/linux-block
git bisect good 23d4ed53b7342bf5999b3ea227d9f69e75e5a625
# good: [2840c566e95599cd60c7143762ca8b49d9395050] Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect good 2840c566e95599cd60c7143762ca8b49d9395050
# good: [4251c2a67011801caecd63671f26dd8c9aedb24c] Merge tag 
'modules-next-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
git bisect good 4251c2a67011801caecd63671f26dd8c9aedb24c
# skip: [3dae8750c368f8ac11c3c8c2a28f56dcee865c01] cifs: switch to 
->write_iter()
git bisect skip 3dae8750c368f8ac11c3c8c2a28f56dcee865c01
# good: [5c02c392cd2320e8d612376d6b72b6548a680923] Merge tag 
'virtio-next-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
git bisect good 5c02c392cd2320e8d612376d6b72b6548a680923
# bad: [5f073850602084fbcbb987948ff3e70ae273f7d2] kill 
generic_file_splice_write()
git bisect bad 5f073850602084fbcbb987948ff3e70ae273f7d2
# good: [38583f095c5a8138ae2a1c9173d0fd8a9f10e8aa] Merge branch 'akpm' 
(incoming from Andrew)
git bisect good 38583f095c5a8138ae2a1c9173d0fd8a9f10e8aa
# good: [f5ccfe1ddbaf9d923a3ebdadcb1e5e32d83e9c28] ext4: fix locking for 
O_APPEND writes
git bisect good f5ccfe1ddbaf9d923a3ebdadcb1e5e32d83e9c28
# bad: [f0d1bec9d58d4c038d0ac958c9af82be6eb18045] new helper: 
copy_page_from_iter()
git bisect bad f0d1bec9d58d4c038d0ac958c9af82be6eb18045


% (unset DISPLAY; git bisect visualize)
commit f0d1bec9d58d4c038d0ac958c9af82be6eb18045
Author: Al Viro 
Date:   Thu Apr 3 15:05:18 2014 -0400

new helper: copy_page_from_iter()

parallel to copy_page_to_iter().  pipe_write() switched to it (and became
->write_iter()).

Signed-off-by: Al Viro 

commit 84c3d55cc474f9c234c023c92e2769f940d5548c
Author: Al Viro 
Date:   Thu Apr 3 14:33:23 2014 -0400

fuse: switch to ->write_iter()

Signed-off-by: Al Viro 

commit b30ac0fc4109701fc122d41ee085c65b52dc44a3
Author: Al Viro 
Date:   Thu Apr 3 14:29:04 2014 -0400

btrfs: switch to ->write_iter()

Signed-off-by: Al Viro 

commit 3ef045c3d8ae8550abbfd44074efce6ff642cc86
Author: Al Viro 
Date:   Thu Apr 3 14:25:22 2014 -0400

ocfs2: switch to ->write_iter()

Signed-off-by: Al Viro 

commit bf97f3bc0c32140c43fe5ca53d23514ea46a54ca
Author: Al Viro 
Date:   Thu Apr 3 14:20:23 2014 -0400

xfs: switch to ->write_iter()

Signed-off-by: Al Viro 

commit 50b5551d1719c8bce60c6d4027b814cfc72c2307
Author: Al Viro 
Date:   Thu Apr 3 14:13:46 2014 -0400

afs: switch to ->write_iter()

Signed-off-by: Al Viro 


Re: [PATCH] include/trace/syscall.h: Use HAVE_SYSCALL_TRACEPOINTS instead of TRACEPOINTS

2014-06-20 Thread Steven Rostedt
On Sat, 21 Jun 2014 10:32:37 +0800
Chen Gang  wrote:
 
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index 291c282..a709cbd 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -33,7 +33,7 @@ struct syscall_metadata {
>   struct ftrace_event_call *exit_event;
>  };
>  
> -#ifdef CONFIG_TRACEPOINTS
> +#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
>  static inline void syscall_tracepoint_update(struct task_struct *p)
>  {
>   if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))

This has already been fixed and is in my for-next branch getting ready
to be pushed.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Check for Null return of function of affs_bread in function affs_truncate

2014-06-20 Thread Nick Krause
I don't think that it's a good idea , in that case I
would recommend either leaving this bug open
or close it as there doesn't seem to be a good
way of testing this.
Cheers Nick

On Fri, Jun 20, 2014 at 11:09 PM, Andrew Morton
 wrote:
> On Fri, 20 Jun 2014 22:55:07 -0400 Nick Krause  wrote:
>
>> On Fri, Jun 20, 2014 at 10:38 PM, Andrew Morton
>>  wrote:
>> > On Fri, 20 Jun 2014 22:25:47 -0400 Nick Krause  wrote:
>> >
>> >> If you have any ideas about what is better
>> >> please let me known.
>> >
>> > I think the proposed patch was not a good one - it will cause truncate
>> > to silently return, probably leaving the fs in an inconsistent state.
>> > Neither the user nor the running application know this happened so they
>> > will just keep on modifying the filesystem, possibly mangling it
>> > further.
>> >
>> > The code as it stands at present is better - if bread() fails we'll get
>> > a nice solid oops and the current app will be terminated (at least).
>> > As we're in truncate it's quite possible that the entire fs will get
>> > wedged up due to now-permanently-held i_mutex, which is even better.
>> >
>> >
>> > As for the best fix, umm, hard.  We're pretty screwed if we cannot read
>> > that block at this code site.  Perhaps emit loud printks, forcibly turn
>> > the fs read-only then return -EIO/-ENOMEM/etc from the truncate.  Such
>> > a change would require runtime testing, with some form of developer fault
>> > injection.
>>
>> Fair enough if somebody is running this file system I would be
>> happy to have someone test my code in order to fix this.
>
> (top-posting repaired - please don't top-post!)
>
> It's going to be hard to find such a person.  As mkfs.affs doesn't
> appear to exist (?) your best bet would be to find someone who has an
> Amiga, get them to create a new fs for you (via loopback-on-file) then
> gzip the underlying file and send it to you.  You can then use that fs
> image file as many times as you want via loopback or straight onto a
> disk.  Make sure the image file is zeroed out first so it compresses
> well.
>
> Or something like that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Check for Null return of function of affs_bread in function affs_truncate

2014-06-20 Thread Andrew Morton
On Fri, 20 Jun 2014 22:55:07 -0400 Nick Krause  wrote:

> On Fri, Jun 20, 2014 at 10:38 PM, Andrew Morton
>  wrote:
> > On Fri, 20 Jun 2014 22:25:47 -0400 Nick Krause  wrote:
> >
> >> If you have any ideas about what is better
> >> please let me known.
> >
> > I think the proposed patch was not a good one - it will cause truncate
> > to silently return, probably leaving the fs in an inconsistent state.
> > Neither the user nor the running application know this happened so they
> > will just keep on modifying the filesystem, possibly mangling it
> > further.
> >
> > The code as it stands at present is better - if bread() fails we'll get
> > a nice solid oops and the current app will be terminated (at least).
> > As we're in truncate it's quite possible that the entire fs will get
> > wedged up due to now-permanently-held i_mutex, which is even better.
> >
> >
> > As for the best fix, umm, hard.  We're pretty screwed if we cannot read
> > that block at this code site.  Perhaps emit loud printks, forcibly turn
> > the fs read-only then return -EIO/-ENOMEM/etc from the truncate.  Such
> > a change would require runtime testing, with some form of developer fault
> > injection.
>
> Fair enough if somebody is running this file system I would be
> happy to have someone test my code in order to fix this.

(top-posting repaired - please don't top-post!)

It's going to be hard to find such a person.  As mkfs.affs doesn't
appear to exist (?) your best bet would be to find someone who has an
Amiga, get them to create a new fs for you (via loopback-on-file) then
gzip the underlying file and send it to you.  You can then use that fs
image file as many times as you want via loopback or straight onto a
disk.  Make sure the image file is zeroed out first so it compresses
well.

Or something like that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging: vt6655: remove header declarations for static functions

2014-06-20 Thread James A Shackleford
The functions iwctl_giwscan() and iwctl_siwscan() are only referenced
within iwctl.c -- so, remove their function declarations from iwctl.h
and mark these functions as static.

Signed-off-by: James A Shackleford 
---
 drivers/staging/vt6655/iwctl.c |4 ++--
 drivers/staging/vt6655/iwctl.h |   10 --
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/vt6655/iwctl.c b/drivers/staging/vt6655/iwctl.c
index ba50d7f..747d723 100644
--- a/drivers/staging/vt6655/iwctl.c
+++ b/drivers/staging/vt6655/iwctl.c
@@ -129,7 +129,7 @@ int iwctl_giwname(struct net_device *dev,
  * Wireless Handler : set scan
  */
 
-int iwctl_siwscan(struct net_device *dev,
+static int iwctl_siwscan(struct net_device *dev,
  struct iw_request_info *info,
  struct iw_point *wrq,
  char *extra)
@@ -190,7 +190,7 @@ int iwctl_siwscan(struct net_device *dev,
  * Wireless Handler : get scan results
  */
 
-int iwctl_giwscan(struct net_device *dev,
+static int iwctl_giwscan(struct net_device *dev,
  struct iw_request_info *info,
  struct iw_point *wrq,
  char *extra)
diff --git a/drivers/staging/vt6655/iwctl.h b/drivers/staging/vt6655/iwctl.h
index 10564b4..de0a337 100644
--- a/drivers/staging/vt6655/iwctl.h
+++ b/drivers/staging/vt6655/iwctl.h
@@ -161,16 +161,6 @@ int iwctl_giwpower(struct net_device *dev,
   struct iw_param *wrq,
   char *extra);
 
-int iwctl_giwscan(struct net_device *dev,
- struct iw_request_info *info,
- struct iw_point *wrq,
- char *extra);
-
-int iwctl_siwscan(struct net_device *dev,
- struct iw_request_info *info,
- struct iw_point *wrq,
- char *extra);
-
 //2008-0409-07,  by Einsn Liu
 #ifdef WPA_SUPPLICANT_DRIVER_WEXT_SUPPORT
 int iwctl_siwauth(struct net_device *dev,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH tip/core/rcu] Reduce overhead of cond_resched() checks for RCU

2014-06-20 Thread Paul E. McKenney
Commit ac1bea85781e (Make cond_resched() report RCU quiescent states)
fixed a problem where a CPU looping in the kernel with but one runnable
task would give RCU CPU stall warnings, even if the in-kernel loop
contained cond_resched() calls.  Unfortunately, in so doing, it introduced
performance regressions in Anton Blanchard's will-it-scale "open1" test.
The problem appears to be not so much the increased cond_resched() path
length as an increase in the rate at which grace periods complete, which
increased per-update grace-period overhead.

This commit takes a different approach to fixing this bug, mainly by
moving the RCU-visible quiescent state from cond_resched() to
rcu_note_context_switch(), and by further reducing the check to a
simple non-zero test of a single per-CPU variable.  However, this
approach requires that the force-quiescent-state processing send
resched IPIs to the offending CPUs.  These will be sent only once
the grace period has reached an age specified by the boot/sysfs
parameter rcutree.jiffies_till_sched_qs, or once the grace period
reaches an age halfway to the point at which RCU CPU stall warnings
will be emitted, whichever comes first.

Reported-by: Dave Hansen 
Signed-off-by: Paul E. McKenney 
Cc: Josh Triplett 
Cc: Andi Kleen 
Cc: Christoph Lameter 
Cc: Mike Galbraith 
Cc: Eric Dumazet 

---

 b/Documentation/kernel-parameters.txt |6 +
 b/include/linux/rcupdate.h|   36 
 b/kernel/rcu/tree.c   |  140 +++---
 b/kernel/rcu/tree.h   |6 +
 b/kernel/rcu/tree_plugin.h|2 
 b/kernel/rcu/update.c |   18 
 b/kernel/sched/core.c |7 -
 7 files changed, 125 insertions(+), 90 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 6eaa9cdb7094..910c3829f81d 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2785,6 +2785,12 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
leaf rcu_node structure.  Useful for very large
systems.
 
+   rcutree.jiffies_till_sched_qs= [KNL]
+   Set required age in jiffies for a
+   given grace period before RCU starts
+   soliciting quiescent-state help from
+   rcu_note_context_switch().
+
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5a75d19aa661..243aa4656cb7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -44,7 +44,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 extern int rcu_expedited; /* for sysctl */
@@ -300,41 +299,6 @@ bool __rcu_is_watching(void);
 #endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || 
defined(CONFIG_SMP) */
 
 /*
- * Hooks for cond_resched() and friends to avoid RCU CPU stall warnings.
- */
-
-#define RCU_COND_RESCHED_LIM 256   /* ms vs. 100s of ms. */
-DECLARE_PER_CPU(int, rcu_cond_resched_count);
-void rcu_resched(void);
-
-/*
- * Is it time to report RCU quiescent states?
- *
- * Note unsynchronized access to rcu_cond_resched_count.  Yes, we might
- * increment some random CPU's count, and possibly also load the result from
- * yet another CPU's count.  We might even clobber some other CPU's attempt
- * to zero its counter.  This is all OK because the goal is not precision,
- * but rather reasonable amortization of rcu_note_context_switch() overhead
- * and extremely high probability of avoiding RCU CPU stall warnings.
- * Note that this function has to be preempted in just the wrong place,
- * many thousands of times in a row, for anything bad to happen.
- */
-static inline bool rcu_should_resched(void)
-{
-   return raw_cpu_inc_return(rcu_cond_resched_count) >=
-  RCU_COND_RESCHED_LIM;
-}
-
-/*
- * Report quiscent states to RCU if it is time to do so.
- */
-static inline void rcu_cond_resched(void)
-{
-   if (unlikely(rcu_should_resched()))
-   rcu_resched();
-}
-
-/*
  * Infrastructure to implement the synchronize_() primitives in
  * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
  */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f1ba77363fbb..7d711f9a2e86 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -206,6 +206,70 @@ void rcu_bh_qs(int cpu)
rdp->passed_quiesce = 1;
 }
 
+static DEFINE_PER_CPU(int, rcu_sched_qs_mask);
+
+static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
+   .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
+   .dynticks = ATOMIC_INIT(1),
+#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
+   .dynticks_idle_nesting = DYNTICK_TASK_NEST_VALUE,
+   .dynticks_idle = ATOMIC_INIT(1),

Re: [PATCH V3 16/16] irqchip: crossbar: allow for quirky hardware with direct hardwiring of GIC

2014-06-20 Thread Jason Cooper
On Mon, Jun 16, 2014 at 04:53:16PM +0530, Sricharan R wrote:
> From: Nishanth Menon 
> 
> On certain platforms such as DRA7, SPIs 0, 1, 2, 3, 5, 6, 10, 131,
> 132, 133 are direct wired to hardware blocks bypassing crossbar.
> This quirky implementation is *NOT* supposed to be the expectation
> of crossbar hardware usage. However, these are already marked in our
> description of the hardware with SKIP and RESERVED where appropriate.
> 
> Unfortunately, we need to be able to refer to these hardwired IRQs.
> So, to request these, crossbar driver can use the existing information
> from it's table that these SKIP/RESERVED maps are direct wired sources
> and generic allocation/programming of crossbar should be avoided.
> 
> Signed-off-by: Nishanth Menon 
> Signed-off-by: Sricharan R 
> ---
>  .../devicetree/bindings/arm/omap/crossbar.txt  |   12 ++--
>  drivers/irqchip/irq-crossbar.c |   20 
> ++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/arm/omap/crossbar.txt 
> b/Documentation/devicetree/bindings/arm/omap/crossbar.txt
> index 8210ea4..438ccab 100644
> --- a/Documentation/devicetree/bindings/arm/omap/crossbar.txt
> +++ b/Documentation/devicetree/bindings/arm/omap/crossbar.txt
> @@ -42,8 +42,10 @@ Documentation/devicetree/bindings/arm/gic.txt for further 
> details.
>  
>  An interrupt consumer on an SoC using crossbar will use:
>   interrupts = 
> -request number shall be between 0 to that described by
> -"ti,max-crossbar-sources"
> +When the request number is between 0 to that described by
> +"ti,max-crossbar-sources", it is assumed to be a crossbar mapping. If the
> +request_number is greater than "ti,max-crossbar-sources", then it is mapped 
> as a
> +quirky hardware mapping direct to GIC.
>  
>  Example:
>   device_x@0x4a023000 {
> @@ -51,3 +53,9 @@ Example:
>   interrupts = ;
>   ...
>   };
> +
> + device_y@0x4a033000 {
> + /* Direct mapped GIC SPI 1 used */
> + interrupts = ;

Ideally, I'd like to see a macro here so that it's clear that we crossed
a magic threshold. eg:

#define MAX_SOURCES 400
#define DIRECT_IRQ(irq) (MAX_SOURCES + irq)
...
interrupts = ;

and, then:

ti,max-crossbar-sources = ;

> + ...
> + };
> diff --git a/drivers/irqchip/irq-crossbar.c b/drivers/irqchip/irq-crossbar.c
> index ef613c4..fff6218 100644
> --- a/drivers/irqchip/irq-crossbar.c
> +++ b/drivers/irqchip/irq-crossbar.c
> @@ -86,8 +86,13 @@ static inline int allocate_free_irq(int cb_no)
>  
>  static inline bool needs_crossbar_write(irq_hw_number_t hw)
>  {
> - if (hw > GIC_IRQ_START)
> - return true;
> + int cb_no;
> +
> + if (hw > GIC_IRQ_START) {
> + cb_no = cb->irq_map[hw - GIC_IRQ_START];
> + if (cb_no != IRQ_RESERVED && cb_no != IRQ_SKIP)
> + return true;
> + }
>  
>   return false;
>  }
> @@ -130,8 +135,19 @@ static int crossbar_domain_xlate(struct irq_domain *d,
>  {
>   int ret;
>   int req_num = intspec[1];
> + int direct_map_num;
>  
>   if (req_num >= cb->max_crossbar_sources) {
> + direct_map_num = req_num - cb->max_crossbar_sources;
> + if (direct_map_num < cb->int_max) {
> + ret = cb->irq_map[direct_map_num];
> + if (ret == IRQ_RESERVED || ret == IRQ_SKIP) {
> + /* We use the interrupt num as h/w irq num */
> + ret = direct_map_num;
> + goto found;
> + }
> + }
> +
>   pr_err("%s: requested crossbar number %d > max %d\n",
>  __func__, req_num, cb->max_crossbar_sources);
>   return -EINVAL;

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Check for Null return of function of affs_bread in function affs_truncate

2014-06-20 Thread Nick Krause
Fair enough if somebody is running this file system I would be
happy to have someone test my code in order to fix this.
Cheers Nick

On Fri, Jun 20, 2014 at 10:38 PM, Andrew Morton
 wrote:
> On Fri, 20 Jun 2014 22:25:47 -0400 Nick Krause  wrote:
>
>> If you have any ideas about what is better
>> please let me known.
>
> I think the proposed patch was not a good one - it will cause truncate
> to silently return, probably leaving the fs in an inconsistent state.
> Neither the user nor the running application know this happened so they
> will just keep on modifying the filesystem, possibly mangling it
> further.
>
> The code as it stands at present is better - if bread() fails we'll get
> a nice solid oops and the current app will be terminated (at least).
> As we're in truncate it's quite possible that the entire fs will get
> wedged up due to now-permanently-held i_mutex, which is even better.
>
>
> As for the best fix, umm, hard.  We're pretty screwed if we cannot read
> that block at this code site.  Perhaps emit loud printks, forcibly turn
> the fs read-only then return -EIO/-ENOMEM/etc from the truncate.  Such
> a change would require runtime testing, with some form of developer fault
> injection.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging:rtl8821ae: rewrite legacy wifi check in halbcoutsrc

2014-06-20 Thread Nick Krause
Is this patch being merged or is this not an issue. I am confused
did I make a mistake in my patch or is there being a different
patch being merged.
Thank Nick

On Fri, Jun 20, 2014 at 10:34 PM, Joe Perches  wrote:
> On Fri, 2014-06-20 at 22:26 -0400, Nick Krause wrote:
>> Thanks for the feedback I will resend the patch fixed.
>
> Please do not.
>
>> Otherwise please use Larry's idea.
>
> It's not Larry's idea.  Larry is the primary
> contributor for Realtek drivers in staging.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Check for Null return of function of affs_bread in function affs_truncate

2014-06-20 Thread Andrew Morton
On Fri, 20 Jun 2014 22:25:47 -0400 Nick Krause  wrote:

> If you have any ideas about what is better
> please let me known.

I think the proposed patch was not a good one - it will cause truncate
to silently return, probably leaving the fs in an inconsistent state. 
Neither the user nor the running application know this happened so they
will just keep on modifying the filesystem, possibly mangling it
further.

The code as it stands at present is better - if bread() fails we'll get
a nice solid oops and the current app will be terminated (at least). 
As we're in truncate it's quite possible that the entire fs will get
wedged up due to now-permanently-held i_mutex, which is even better.


As for the best fix, umm, hard.  We're pretty screwed if we cannot read
that block at this code site.  Perhaps emit loud printks, forcibly turn
the fs read-only then return -EIO/-ENOMEM/etc from the truncate.  Such
a change would require runtime testing, with some form of developer fault
injection.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging:rtl8821ae: rewrite legacy wifi check in halbcoutsrc

2014-06-20 Thread Joe Perches
On Fri, 2014-06-20 at 22:26 -0400, Nick Krause wrote:
> Thanks for the feedback I will resend the patch fixed.

Please do not.

> Otherwise please use Larry's idea.

It's not Larry's idea.  Larry is the primary
contributor for Realtek drivers in staging.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 03/16] irqchip: crossbar: introduce ti,irqs-skip to skip

2014-06-20 Thread Jason Cooper
Sricharan,

Your subject line seems truncated:

  "irqchip: crossbar: introduce ti,irqs-skip to skip"

maybe "... Introduce DT property to skip hardwired irqs" ?

Also note that you need to correct the subject line for *every* patch in
the series wrt capitalization.

I don't mind correcting it when I apply it, provided that:

 - the patch is otherwise ready
 - I only have to do it once or twice for the series
 - I never had a chance to ask since you created a rockstar patch series
   the first time out of the gate (except for capitalization).

Once I've looked over the whole series, please resend with the subject
lines corrected.

On Mon, Jun 16, 2014 at 04:53:03PM +0530, Sricharan R wrote:
> From: Nishanth Menon 
> 
> When, in the system due to varied reasons, interrupts might be unusable
> due to hardware behavior, but register maps do exist, then those interrupts
> should be skipped while mapping irq to crossbars.
> 
> Signed-off-by: Nishanth Menon 
> Signed-off-by: Sricharan R 
> ---
> [V3] introduced ti,irqs-skip dt property to list the
>  irqs to be skipped.
> 
>  .../devicetree/bindings/arm/omap/crossbar.txt  |4 
>  drivers/irqchip/irq-crossbar.c |   20 
> 
>  2 files changed, 24 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/arm/omap/crossbar.txt 
> b/Documentation/devicetree/bindings/arm/omap/crossbar.txt
> index fb88585..cfcbd52 100644
> --- a/Documentation/devicetree/bindings/arm/omap/crossbar.txt
> +++ b/Documentation/devicetree/bindings/arm/omap/crossbar.txt
> @@ -17,6 +17,10 @@ Required properties:
>so crossbar bar driver should not consider them as free
>lines.
>  
> +Optional properties:
> +- ti,irqs-skip: This is similar to "ti,irqs-reserved", but are irq mappings
> +  which are not supposed to be used for errata or other 
> reasons(virtualization).

I would specifically mention SoC-specific hard-wiring of irqs here.
Also the fact that the hardwiring unexpectedly bypasses the crossbar.

> +
>  Examples:
>   crossbar_mpu: @4a02 {
>   compatible = "ti,irq-crossbar";

Please include a ti,irqs-skip example here.

> diff --git a/drivers/irqchip/irq-crossbar.c b/drivers/irqchip/irq-crossbar.c
> index 51d4b87..27049de 100644
> --- a/drivers/irqchip/irq-crossbar.c
> +++ b/drivers/irqchip/irq-crossbar.c
> @@ -18,6 +18,7 @@
>  
>  #define IRQ_FREE -1
>  #define IRQ_RESERVED -2
> +#define IRQ_SKIP -3
>  #define GIC_IRQ_START32
>  
>  /*
> @@ -160,6 +161,25 @@ static int __init crossbar_of_init(struct device_node 
> *node)
>   }
>   }
>  
> + /* Skip the ones marked as skip */

This comment is redundant, perhaps "Skip irqs hardwired to bypass the
crossbar."?

> + irqsr = of_get_property(node, "ti,irqs-skip", );
> + if (irqsr) {
> + size /= sizeof(__be32);
> +
> + for (i = 0; i < size; i++) {
> + of_property_read_u32_index(node,
> +"ti,irqs-skip",
> +i, );
> + if (entry > max) {
> + pr_err("Invalid skip entry\n");
> + ret = -EINVAL;
> + goto err3;
> + }
> + cb->irq_map[entry] = IRQ_SKIP;
> + }
> + }
> +
> +
>   cb->register_offsets = kzalloc(max * sizeof(int), GFP_KERNEL);
>   if (!cb->register_offsets)
>   goto err3;

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] include/trace/syscall.h: Use HAVE_SYSCALL_TRACEPOINTS instead of TRACEPOINTS

2014-06-20 Thread Chen Gang
At present, most architectures can support TRACEPOINTS, but about 10/29
architectures support HAVE_SYSCALL_TRACEPOINTS.

TIF_SYSCALL_TRACEPOINT depends on HAVE_SYSCALL_TRACEPOINTS, not all
architectures which support TRACEPOINTS also must support
TIF_SYSCALL_TRACEPOINT.

So at present, need use HAVE_SYSCALL_TRACEPOINTS instead of TRACEPOINTS,
or can not pass compiling. The related error (allmodconfig under score):

CC  init/main.o
  In file included from include/asm-generic/preempt.h:4:0,
   from arch/score/include/generated/asm/preempt.h:1,
   from include/linux/preempt.h:18,
   from include/linux/spinlock.h:50,
   from include/linux/seqlock.h:35,
   from include/linux/time.h:5,
   from include/linux/stat.h:18,
   from include/linux/module.h:10,
   from init/main.c:15:
  include/trace/syscall.h: In function 'syscall_tracepoint_update':
  include/trace/syscall.h:39:23: error: 'TIF_SYSCALL_TRACEPOINT' undeclared 
(first use in this function)
if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
 ^
  include/linux/thread_info.h:103:45: note: in definition of macro 
'test_thread_flag'
test_ti_thread_flag(current_thread_info(), flag)
   ^
  include/trace/syscall.h:39:23: note: each undeclared identifier is reported 
only once for each function it appears in
if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
 ^
  include/linux/thread_info.h:103:45: note: in definition of macro 
'test_thread_flag'
test_ti_thread_flag(current_thread_info(), flag)
   ^
  make[1]: *** [init/main.o] Error 1
  make: *** [init] Error 2


Signed-off-by: Chen Gang 
---
 include/trace/syscall.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index 291c282..a709cbd 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -33,7 +33,7 @@ struct syscall_metadata {
struct ftrace_event_call *exit_event;
 };
 
-#ifdef CONFIG_TRACEPOINTS
+#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 static inline void syscall_tracepoint_update(struct task_struct *p)
 {
if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
-- 
1.9.2.459.g68773ac
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] initramfs: Support initrd that is bigger than 2GiB

2014-06-20 Thread Yinghai Lu
When initrd (compressed or not) is used, kernel report data corrupted
with /dev/ram0.

The root cause:
During initramfs checking, if it is initrd, it will be transferred to
/initrd.image with sys_write.
sys_write only support 2G-4K write, so if the initrd ram is more than
that, /initrd.image will not complete at all.

Add local xwrite to loop calling sys_write to workaround the
problem.

Also need to use xwrite in write_buffer() to handle:
image is uncompressed cpio and there is one big file (>2G) in it.
   unpack_to_rootfs ===> write_buffer ===> actions[]/do_copy

At the same time, we don't need to worry about sys_read/sys_write in
do_mounts_rd.c::crd_load. As decompressor will have fill/flush and
local buffer that is smaller than 2G.

Test with uncompressed initrd, and compressed ones with gz, bz2, lzma,xz,
lzop.

-v2: according to HPA, change name to xwrite.

Signed-off-by: Yinghai Lu 
Acked-by: H. Peter Anvin 

---
 init/initramfs.c |   33 +
 1 file changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6/init/initramfs.c
===
--- linux-2.6.orig/init/initramfs.c
+++ linux-2.6/init/initramfs.c
@@ -19,6 +19,26 @@
 #include 
 #include 
 
+static long __init xwrite(unsigned int fd, char *p,
+  size_t count)
+{
+   ssize_t left = count;
+   long written;
+
+   /* sys_write only can write MAX_RW_COUNT aka 2G-4K bytes at most */
+   while (left > 0) {
+   written = sys_write(fd, p, left);
+
+   if (written <= 0)
+   break;
+
+   left -= written;
+   p += written;
+   }
+
+   return (written < 0) ? written : count;
+}
+
 static __initdata char *message;
 static void __init error(char *x)
 {
@@ -346,7 +366,7 @@ static int __init do_name(void)
 static int __init do_copy(void)
 {
if (count >= body_len) {
-   sys_write(wfd, victim, body_len);
+   xwrite(wfd, victim, body_len);
sys_close(wfd);
do_utime(vcollected, mtime);
kfree(vcollected);
@@ -354,7 +374,7 @@ static int __init do_copy(void)
state = SkipIt;
return 0;
} else {
-   sys_write(wfd, victim, count);
+   xwrite(wfd, victim, count);
body_len -= count;
eat(count);
return 1;
@@ -604,8 +624,13 @@ static int __init populate_rootfs(void)
fd = sys_open("/initrd.image",
  O_WRONLY|O_CREAT, 0700);
if (fd >= 0) {
-   sys_write(fd, (char *)initrd_start,
-   initrd_end - initrd_start);
+   long written = xwrite(fd, (char *)initrd_start,
+   initrd_end - initrd_start);
+
+   if (written != initrd_end - initrd_start)
+   pr_err("/initrd.image: incomplete write (%ld != 
%ld)\n",
+  written, initrd_end - initrd_start);
+
sys_close(fd);
free_initrd();
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] initrd: Fix lz4 decompress with initrd

2014-06-20 Thread Yinghai Lu
During testing initrd (>2G) support, find decompress/lz4 does
not work with initrd at all.

decompress_* should support:
1. inbuf[]/outbuf[] for kernel preboot.
2. inbuf[]/flush() for initramfs
3. fill()/flush() for initrd.

in the unlz4 does not handle case 3, as input len is passed
as 0, and it failed in first try.

Fix that add one extra if (fill) checking, and get out if
EOF from the fill().

Signed-off-by: Yinghai Lu 

---
 lib/decompress_unlz4.c |   65 -
 1 file changed, 43 insertions(+), 22 deletions(-)

Index: linux-2.6/lib/decompress_unlz4.c
===
--- linux-2.6.orig/lib/decompress_unlz4.c
+++ linux-2.6/lib/decompress_unlz4.c
@@ -83,13 +83,20 @@ STATIC inline int INIT unlz4(u8 *input,
if (posp)
*posp = 0;
 
-   if (fill)
-   fill(inp, 4);
+   if (fill) {
+   size = fill(inp, 4);
+   if (size < 4) {
+   error("data corrupted");
+   goto exit_2;
+   }
+   }
 
chunksize = get_unaligned_le32(inp);
if (chunksize == ARCHIVE_MAGICNUMBER) {
-   inp += 4;
-   size -= 4;
+   if (!fill) {
+   inp += 4;
+   size -= 4;
+   }
} else {
error("invalid header");
goto exit_2;
@@ -100,29 +107,44 @@ STATIC inline int INIT unlz4(u8 *input,
 
for (;;) {
 
-   if (fill)
-   fill(inp, 4);
+   if (fill) {
+   size = fill(inp, 4);
+   if (size == 0)
+   break;
+   if (size < 4) {
+   error("data corrupted");
+   goto exit_2;
+   }
+   }
 
chunksize = get_unaligned_le32(inp);
if (chunksize == ARCHIVE_MAGICNUMBER) {
-   inp += 4;
-   size -= 4;
+   if (!fill) {
+   inp += 4;
+   size -= 4;
+   }
if (posp)
*posp += 4;
continue;
}
-   inp += 4;
-   size -= 4;
+
 
if (posp)
*posp += 4;
 
-   if (fill) {
+   if (!fill) {
+   inp += 4;
+   size -= 4;
+   } else {
if (chunksize > lz4_compressbound(uncomp_chunksize)) {
error("chunk length is longer than allocated");
goto exit_2;
}
-   fill(inp, chunksize);
+   size = fill(inp, chunksize);
+   if (size < chunksize) {
+   error("data corrupted");
+   goto exit_2;
+   }
}
 #ifdef PREBOOT
if (out_len >= uncomp_chunksize) {
@@ -149,18 +171,17 @@ STATIC inline int INIT unlz4(u8 *input,
if (posp)
*posp += chunksize;
 
-   size -= chunksize;
+   if (!fill) {
+   size -= chunksize;
 
-   if (size == 0)
-   break;
-   else if (size < 0) {
-   error("data corrupted");
-   goto exit_2;
+   if (size == 0)
+   break;
+   else if (size < 0) {
+   error("data corrupted");
+   goto exit_2;
+   }
+   inp += chunksize;
}
-
-   inp += chunksize;
-   if (fill)
-   inp = inp_start;
}
 
ret = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] initramfs: Support initramfs that is bigger than 2GiB

2014-06-20 Thread Yinghai Lu
Now with 64bit bzImage and kexec tools, we support ramdisk that size
is bigger than 2g, as we could put it above 4G.

Found compressed initramfs image could not be decompressed
properly. It turns out that image length is int during decompress
detection, and it will become < 0 when length is more than 2G.
Furthermore, during decompressing len as int is used for inbuf count,
that has problem too.

Change len to long, that should be ok as on 32 bit platform long is
32bits.

Tested with following compressed initramfs image as root with kexec.
gzip, bzip2, xz, lzma, lzop, lz4.
run time for populate_rootfs():
   sizename   Nehalem-EX  Westmere-EX  Ivybridge-EX
 9034400256 root_img :   26s   24s  30s
 3561095057 root_img.lz4 :   28s   27s  27s
 3459554629 root_img.lzo :   29s   29s  28s
 3219399480 root_img.gz  :   64s   62s  49s
 2251594592 root_img.xz  :  262s  260s 183s
 2226366598 root_img.lzma:  386s  376s 277s
 2901482513 root_img.bz2 :  635s  599s

-v2: fix pr_debug format error.

Signed-off-by: Yinghai Lu 

---
 crypto/zlib.c  |8 
 fs/isofs/compress.c|6 +-
 fs/jffs2/compr_zlib.c  |7 ---
 include/linux/decompress/bunzip2.h |8 
 include/linux/decompress/generic.h |   10 +-
 include/linux/decompress/inflate.h |8 
 include/linux/decompress/unlz4.h   |8 
 include/linux/decompress/unlzma.h  |8 
 include/linux/decompress/unlzo.h   |8 
 include/linux/decompress/unxz.h|8 
 include/linux/zlib.h   |4 ++--
 init/do_mounts_rd.c|   10 +-
 init/initramfs.c   |   22 +++---
 lib/decompress.c   |2 +-
 lib/decompress_bunzip2.c   |   26 +-
 lib/decompress_inflate.c   |   12 ++--
 lib/decompress_unlz4.c |   18 +-
 lib/decompress_unlzma.c|   28 ++--
 lib/decompress_unlzo.c |   12 ++--
 lib/decompress_unxz.c  |   10 +-
 20 files changed, 110 insertions(+), 113 deletions(-)

Index: linux-2.6/include/linux/decompress/generic.h
===
--- linux-2.6.orig/include/linux/decompress/generic.h
+++ linux-2.6/include/linux/decompress/generic.h
@@ -1,11 +1,11 @@
 #ifndef DECOMPRESS_GENERIC_H
 #define DECOMPRESS_GENERIC_H
 
-typedef int (*decompress_fn) (unsigned char *inbuf, int len,
- int(*fill)(void*, unsigned int),
- int(*flush)(void*, unsigned int),
+typedef int (*decompress_fn) (unsigned char *inbuf, long len,
+ long (*fill)(void*, unsigned long),
+ long (*flush)(void*, unsigned long),
  unsigned char *outbuf,
- int *posp,
+ long *posp,
  void(*error)(char *x));
 
 /* inbuf   - input buffer
@@ -33,7 +33,7 @@ typedef int (*decompress_fn) (unsigned c
 
 
 /* Utility routine to detect the decompression method */
-decompress_fn decompress_method(const unsigned char *inbuf, int len,
+decompress_fn decompress_method(const unsigned char *inbuf, long len,
const char **name);
 
 #endif
Index: linux-2.6/init/initramfs.c
===
--- linux-2.6.orig/init/initramfs.c
+++ linux-2.6/init/initramfs.c
@@ -174,7 +174,7 @@ static __initdata enum state {
 } state, next_state;
 
 static __initdata char *victim;
-static __initdata unsigned count;
+static unsigned long count __initdata;
 static __initdata loff_t this_header, next_header;
 
 static inline void __init eat(unsigned n)
@@ -186,7 +186,7 @@ static inline void __init eat(unsigned n
 
 static __initdata char *vcollected;
 static __initdata char *collected;
-static __initdata int remains;
+static long remains __initdata;
 static __initdata char *collect;
 
 static void __init read_into(char *buf, unsigned size, enum state next)
@@ -213,7 +213,7 @@ static int __init do_start(void)
 
 static int __init do_collect(void)
 {
-   unsigned n = remains;
+   unsigned long n = remains;
if (count < n)
n = count;
memcpy(collect, victim, n);
@@ -384,7 +384,7 @@ static __initdata int (*actions[])(void)
[Reset] = do_reset,
 };
 
-static int __init write_buffer(char *buf, unsigned len)
+static long __init write_buffer(char *buf, unsigned long len)
 {
count = len;
victim = buf;
@@ -394,11 +394,11 @@ static int __init write_buffer(char *buf
return len - count;
 }
 
-static int __init flush_buffer(void *bufv, unsigned len)
+static long __init 

Re: [PATCH] staging:rtl8821ae: rewrite legacy wifi check in halbcoutsrc

2014-06-20 Thread Nick Krause
Thanks for the feedback I will resend the patch fixed.
Otherwise please use Larry's idea.
Cheers Nick

On Fri, Jun 20, 2014 at 4:08 PM, Joe Perches  wrote:
> On Fri, 2014-06-20 at 22:59 +0300, Dan Carpenter wrote:
>> On Fri, Jun 20, 2014 at 12:56:50PM -0400, Nicholas Krause wrote:
>> > Rewrites the wireless check for legacy checking in function
>> > halbtc_legacy to check for both Mode A and B.
>>
>> You're just guessing that A and B were intended but it could have been
>> something B and G...
>>
>> Don't do this.  Just leave the static checker warning there so someone
>> can fix it properly instead of introducing a second new bug and hiding
>> the warning so it's impossible to find.
>>
>
> It's most likely G anyway:
>
> drivers/staging/rtl8192ee/btcoexist/halbtcoutsrc.c: if ((mac->mode == 
> WIRELESS_MODE_B) || (mac->mode == WIRELESS_MODE_G))
> drivers/staging/rtl8821ae/btcoexist/halbtcoutsrc.c: if ((mac->mode == 
> WIRELESS_MODE_B) || (mac->mode == WIRELESS_MODE_B))
>
> Larry probably has a better idea.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Check for Null return of function of affs_bread in function affs_truncate

2014-06-20 Thread Nick Krause
Thanks for standing up for me Thomas.
If you have any ideas about what is better
please let me known.
Cheers Nick

On Fri, Jun 20, 2014 at 7:59 PM, Thomas Gleixner  wrote:
> On Fri, 20 Jun 2014, Nick Krause wrote:
>
>> Ok that's fine I would return as if it's a NULL the other parts of the
>> function can't continue.
>> Nick
>>
>> On Thu, Jun 19, 2014 at 1:21 AM, Dan Carpenter  
>> wrote:
>> > On Wed, Jun 18, 2014 at 06:08:05PM -0400, Nicholas Krause wrote:
>> >> Signed-off-by: Nicholas Krause 
>> >> ---
>> >>  fs/affs/file.c | 2 ++
>> >>  1 file changed, 2 insertions(+)
>> >>
>> >> diff --git a/fs/affs/file.c b/fs/affs/file.c
>> >> index a7fe57d..f26482d 100644
>> >> --- a/fs/affs/file.c
>> >> +++ b/fs/affs/file.c
>> >> @@ -923,6 +923,8 @@ affs_truncate(struct inode *inode)
>> >>
>> >>   while (ext_key) {
>> >>   ext_bh = affs_bread(sb, ext_key);
>> >> + if (!ext_bh)
>> >> + return;
>> >
>> > The problem is that we don't know if we should return here or break
>> > here.  If you don't understand the code, then it's best to just leave it
>> > alone.
>
> Dan, what kind of attitude is that?
>
> Nick certainly found an issue where a possible NULL return from
> affs_bread() can cause havoc.
>
> Do YOU understand that code?
>
> If yes, you better explain, WHY Nicks finding is a false positive
> instead of just telling him off in a very inpolite way.
>
> If not, you better refrain from telling a reporter that he does not
> understand the code and should stay away.
>
> You clearly stated that you do not understand it either:
>
>> > The problem is that we don't know if we should return here or break
>> > here.
>
> The problem here is that proceeding with a known NULL pointer is wrong
> to begin with. It does not matter at all whether break or return is
> the proper thing to do. What matters is that proceeding with a NULL
> pointer is wrong to begin with, no matter what.
>
> So either explain why this is a non issue and the NULL pointer return
> cannot happen or shut up and try to find a proper solution for that
> "return" vs. "break" issue.
>
> Thanks,
>
> tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] irqchip: nvic: Use the generic noop function

2014-06-20 Thread Jason Cooper
On Wed, Jun 04, 2014 at 04:01:52PM +0100, Daniel Thompson wrote:
> Using the generic function saves looking up this custom one in a source
> navigator.
> 
> Signed-off-by: Daniel Thompson 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> ---
>  drivers/irqchip/irq-nvic.c | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)

Applied to irqchip/core with Uwe's Ack.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] irqchip: brcmstb-l2: Level-2 interrupts are edge sensitive

2014-06-20 Thread Jason Cooper
On Mon, Jun 09, 2014 at 11:05:02AM -0700, Florian Fainelli wrote:
> The driver was configuring the interrupt handler for the Level-2
> interrupts to be "level" triggered while they are in fact "edge"
> triggered. Fix this by using the correct handler.
> 
> Reported-by: Brian Norris 
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/irqchip/irq-brcmstb-l2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied to irqchip/urgent

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/vdso: Improve the fake section headers

2014-06-20 Thread tip-bot for Andy Lutomirski
Commit-ID:  bfad381c0d1e19cae8461e105d8d4387dd2a14fe
Gitweb: http://git.kernel.org/tip/bfad381c0d1e19cae8461e105d8d4387dd2a14fe
Author: Andy Lutomirski 
AuthorDate: Wed, 18 Jun 2014 15:59:48 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 19 Jun 2014 15:45:12 -0700

x86/vdso: Improve the fake section headers

Fully stripping the vDSO has other unfortunate side effects:

 - binutils is unable to find ELF notes without a SHT_NOTE section.

 - Even elfutils has trouble: it can find ELF notes without a section
   table at all, but if a section table is present, it won't look for
   PT_NOTE.

 - gdb wants section names to match between stripped DSOs and their
   symbols; otherwise it will corrupt symbol addresses.

We're also breaking the rules: section 0 is supposed to be SHT_NULL.

Fix these problems by building a better fake section table.  While
we're at it, we might as well let buggy Go versions keep working well
by giving the SHT_DYNSYM entry the correct size.

This is a bit unfortunate: it adds quite a bit of size to the vdso
image.

If/when binutils improves and the improved versions become widespread,
it would be worth considering dropping most of this.

Signed-off-by: Andy Lutomirski 
Link: 
http://lkml.kernel.org/r/0e546a5eeaafdf1840e6ee654a55c1e727c26663.1403129369.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/vdso/Makefile   |   4 +-
 arch/x86/vdso/vdso-fakesections.c|  44 
 arch/x86/vdso/vdso-layout.lds.S  |  40 +--
 arch/x86/vdso/vdso.lds.S |   2 +
 arch/x86/vdso/vdso2c.c   |  31 --
 arch/x86/vdso/vdso2c.h   | 180 +++
 arch/x86/vdso/vdso32/vdso-fakesections.c |   1 +
 arch/x86/vdso/vdsox32.lds.S  |   2 +
 8 files changed, 237 insertions(+), 67 deletions(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 3c0809a..2c1ca98 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -11,7 +11,6 @@ VDSO32-$(CONFIG_COMPAT)   := y
 
 # files to link into the vdso
 vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vdso-fakesections.o
-vobjs-nox32 := vdso-fakesections.o
 
 # files to link into kernel
 obj-y  += vma.o
@@ -134,7 +133,7 @@ override obj-dirs = $(dir $(obj)) $(obj)/vdso32/
 
 targets += vdso32/vdso32.lds
 targets += vdso32/note.o vdso32/vclock_gettime.o $(vdso32.so-y:%=vdso32/%.o)
-targets += vdso32/vclock_gettime.o
+targets += vdso32/vclock_gettime.o vdso32/vdso-fakesections.o
 
 $(obj)/vdso32.o: $(vdso32-images:%=$(obj)/%)
 
@@ -155,6 +154,7 @@ $(vdso32-images:%=$(obj)/%.dbg): KBUILD_CFLAGS = 
$(KBUILD_CFLAGS_32)
 $(vdso32-images:%=$(obj)/%.dbg): $(obj)/vdso32-%.so.dbg: FORCE \
 $(obj)/vdso32/vdso32.lds \
 $(obj)/vdso32/vclock_gettime.o \
+$(obj)/vdso32/vdso-fakesections.o \
 $(obj)/vdso32/note.o \
 $(obj)/vdso32/%.o
$(call if_changed,vdso)
diff --git a/arch/x86/vdso/vdso-fakesections.c 
b/arch/x86/vdso/vdso-fakesections.c
index cb8a8d7..56927a7 100644
--- a/arch/x86/vdso/vdso-fakesections.c
+++ b/arch/x86/vdso/vdso-fakesections.c
@@ -2,31 +2,23 @@
  * Copyright 2014 Andy Lutomirski
  * Subject to the GNU Public License, v.2
  *
- * Hack to keep broken Go programs working.
- *
- * The Go runtime had a couple of bugs: it would read the section table to try
- * to figure out how many dynamic symbols there were (it shouldn't have looked
- * at the section table at all) and, if there were no SHT_SYNDYM section table
- * entry, it would use an uninitialized value for the number of symbols.  As a
- * workaround, we supply a minimal section table.  vdso2c will adjust the
- * in-memory image so that "vdso_fake_sections" becomes the section table.
- *
- * The bug was introduced by:
- * https://code.google.com/p/go/source/detail?r=56ea40aac72b (2012-08-31)
- * and is being addressed in the Go runtime in this issue:
- * https://code.google.com/p/go/issues/detail?id=8197
+ * String table for loadable section headers.  See vdso2c.h for why
+ * this exists.
  */
 
-#ifndef __x86_64__
-#error This hack is specific to the 64-bit vDSO
-#endif
-
-#include 
-
-extern const __visible struct elf64_shdr vdso_fake_sections[];
-const __visible struct elf64_shdr vdso_fake_sections[] = {
-   {
-   .sh_type = SHT_DYNSYM,
-   .sh_entsize = sizeof(Elf64_Sym),
-   }
-};
+const char fake_shstrtab[] __attribute__((section(".fake_shstrtab"))) =
+   ".hash\0"
+   ".dynsym\0"
+   ".dynstr\0"
+   ".gnu.version\0"
+   ".gnu.version_d\0"
+   ".dynamic\0"
+   ".rodata\0"
+   ".fake_shstrtab\0"  /* Yay, self-referential code. */
+   ".note\0"
+   ".data\0"
+   ".altinstructions\0"
+   ".altinstr_replacement\0"
+   ".eh_frame_hdr\0"
+   ".eh_frame\0"

[tip:x86/urgent] x86/vdso: Remove some redundant in-memory section headers

2014-06-20 Thread tip-bot for Andy Lutomirski
Commit-ID:  0e3727a8839c988a3c56170bc8da76d55a16acad
Gitweb: http://git.kernel.org/tip/0e3727a8839c988a3c56170bc8da76d55a16acad
Author: Andy Lutomirski 
AuthorDate: Wed, 18 Jun 2014 15:59:49 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 19 Jun 2014 15:45:26 -0700

x86/vdso: Remove some redundant in-memory section headers

.data doesn't need to be separate from .rodata: they're both readonly.

.altinstructions and .altinstr_replacement aren't needed by anything
except vdso2c; strip them from the final image.

While we're at it, rather than aligning the actual executable text,
just shove some unused-at-runtime data in between real data and
text.

My vdso image is still above 4k, but I'm disinclined to try to
trim it harder for 3.16.  For future trimming, I suspect that these
sections could be moved to later in the file and dropped from
the in-memory image:

.gnu.version and .gnu.version_d   (this may lose versions in gdb)
.eh_frame (should be harmless)
.eh_frame_hdr (I'm not really sure)
.hash (AFAIK nothing needs this section header)

Signed-off-by: Andy Lutomirski 
Link: 
http://lkml.kernel.org/r/2e96d0c49016ea6d026a614ae645e93edd325961.1403129369.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/vdso/vdso-fakesections.c |  3 ---
 arch/x86/vdso/vdso-layout.lds.S   | 43 +--
 arch/x86/vdso/vdso2c.h|  4 +++-
 3 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/arch/x86/vdso/vdso-fakesections.c 
b/arch/x86/vdso/vdso-fakesections.c
index 56927a7..aa5fbfa 100644
--- a/arch/x86/vdso/vdso-fakesections.c
+++ b/arch/x86/vdso/vdso-fakesections.c
@@ -16,9 +16,6 @@ const char fake_shstrtab[] 
__attribute__((section(".fake_shstrtab"))) =
".rodata\0"
".fake_shstrtab\0"  /* Yay, self-referential code. */
".note\0"
-   ".data\0"
-   ".altinstructions\0"
-   ".altinstr_replacement\0"
".eh_frame_hdr\0"
".eh_frame\0"
".text";
diff --git a/arch/x86/vdso/vdso-layout.lds.S b/arch/x86/vdso/vdso-layout.lds.S
index e4cbc21..9197544 100644
--- a/arch/x86/vdso/vdso-layout.lds.S
+++ b/arch/x86/vdso/vdso-layout.lds.S
@@ -14,7 +14,7 @@
 # error unknown VDSO target
 #endif
 
-#define NUM_FAKE_SHDRS 16
+#define NUM_FAKE_SHDRS 13
 
 SECTIONS
 {
@@ -28,15 +28,17 @@ SECTIONS
.gnu.version_d  : { *(.gnu.version_d) }
.gnu.version_r  : { *(.gnu.version_r) }
 
-   .note   : { *(.note.*) }:text   :note
-
-   .eh_frame_hdr   : { *(.eh_frame_hdr) }  :text   :eh_frame_hdr
-   .eh_frame   : { KEEP (*(.eh_frame)) }   :text
-
.dynamic: { *(.dynamic) }   :text   :dynamic
 
.rodata : {
*(.rodata*)
+   *(.data*)
+   *(.sdata*)
+   *(.got.plt) *(.got)
+   *(.gnu.linkonce.d.*)
+   *(.bss*)
+   *(.dynbss*)
+   *(.gnu.linkonce.b.*)
 
/*
 * Ideally this would live in a C file, but that won't
@@ -50,28 +52,29 @@ SECTIONS
 
.fake_shstrtab  : { *(.fake_shstrtab) } :text
 
-   .data   : {
-   *(.data*)
-   *(.sdata*)
-   *(.got.plt) *(.got)
-   *(.gnu.linkonce.d.*)
-   *(.bss*)
-   *(.dynbss*)
-   *(.gnu.linkonce.b.*)
-   }
 
-   .altinstructions: { *(.altinstructions) }
-   .altinstr_replacement   : { *(.altinstr_replacement) }
+   .note   : { *(.note.*) }:text   :note
+
+   .eh_frame_hdr   : { *(.eh_frame_hdr) }  :text   :eh_frame_hdr
+   .eh_frame   : { KEEP (*(.eh_frame)) }   :text
+
 
/*
-* Align the actual code well away from the non-instruction data.
-* This is the best thing for the I-cache.
+* Text is well-separated from actual data: there's plenty of
+* stuff that isn't used at runtime in between.
 */
-   . = ALIGN(0x100);
 
.text   : { *(.text*) } :text   =0x90909090,
 
/*
+* At the end so that eu-elflint stays happy when vdso2c strips
+* these.  A better implementation would avoid allocating space
+* for these.
+*/
+   .altinstructions: { *(.altinstructions) }   :text
+   .altinstr_replacement   : { *(.altinstr_replacement) }  :text
+
+   /*
 * The remainder of the vDSO consists of special pages that are
 * shared between the kernel and userspace.  It needs to be at the
 * end so that it doesn't overlap the mapping of the actual
diff --git a/arch/x86/vdso/vdso2c.h b/arch/x86/vdso/vdso2c.h
index f01ed4b..f42e2dd 100644
--- a/arch/x86/vdso/vdso2c.h
+++ b/arch/x86/vdso/vdso2c.h
@@ -92,7 +92,9 @@ static void BITSFUNC(copy_section)(struct 

[tip:x86/urgent] x86/vdso: Create .build-id links for unstripped vdso files

2014-06-20 Thread tip-bot for Andy Lutomirski
Commit-ID:  dda1e95cee38b416b23f751cac65421d781e3c10
Gitweb: http://git.kernel.org/tip/dda1e95cee38b416b23f751cac65421d781e3c10
Author: Andy Lutomirski 
AuthorDate: Fri, 20 Jun 2014 12:20:44 -0700
Committer:  H. Peter Anvin 
CommitDate: Fri, 20 Jun 2014 13:18:49 -0700

x86/vdso: Create .build-id links for unstripped vdso files

With this change, doing 'make vdso_install' and telling gdb:

set debug-file-directory /lib/modules/KVER/vdso

will enable vdso debugging with symbols.  This is useful for
testing, but kernel RPM builds will probably want to manually delete
these symlinks or otherwise do something sensible when they strip
the vdso/*.so files.

If ld does not support --build-id, then the symlinks will not be
created.

Note that kernel packagers that use vdso_install may need to adjust
their packaging scripts to accomdate this change.  For example,
Fedora's scripts create build-id symlinks themselves in a different
location, so the spec should probably be updated to remove the
symlinks created by make vdso_install.

Signed-off-by: Andy Lutomirski 
Link: 
http://lkml.kernel.org/r/a424b189ce3ced85fe1e82d032a20e765e0fe0d3.1403291930.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/vdso/Makefile | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 2c1ca98..68a15c4 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -169,14 +169,24 @@ quiet_cmd_vdso = VDSO$@
 sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
 
 VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=sysv) 
\
-   -Wl,-Bsymbolic $(LTO_CFLAGS)
+   $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
 GCOV_PROFILE := n
 
 #
-# Install the unstripped copies of vdso*.so.
+# Install the unstripped copies of vdso*.so.  If our toolchain supports
+# build-id, install .build-id links as well.
 #
 quiet_cmd_vdso_install = INSTALL $(@:install_%=%)
-  cmd_vdso_install = cp $< $(MODLIB)/vdso/$(@:install_%=%)
+define cmd_vdso_install
+   cp $< "$(MODLIB)/vdso/$(@:install_%=%)"; \
+   if readelf -n $< |grep -q 'Build ID'; then \
+ buildid=`readelf -n $< |grep 'Build ID' |sed -e 's/^.*Build ID: 
\(.*\)$$/\1/'`; \
+ first=`echo $$buildid | cut -b-2`; \
+ last=`echo $$buildid | cut -b3-`; \
+ mkdir -p "$(MODLIB)/vdso/.build-id/$$first"; \
+ ln -sf "../../$(@:install_%=%)" 
"$(MODLIB)/vdso/.build-id/$$first/$$last.debug"; \
+   fi
+endef
 
 vdso_img_insttargets := $(vdso_img_sodbg:%.dbg=install_%)
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/vdso2c: Use better macros for ELF bitness

2014-06-20 Thread tip-bot for Andy Lutomirski
Commit-ID:  c1979c370273fd9f7326ffa27a63b9ddb0f495f4
Gitweb: http://git.kernel.org/tip/c1979c370273fd9f7326ffa27a63b9ddb0f495f4
Author: Andy Lutomirski 
AuthorDate: Wed, 18 Jun 2014 15:59:47 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 19 Jun 2014 15:44:59 -0700

x86/vdso2c: Use better macros for ELF bitness

Rather than using a separate macro for each replacement, use generic
macros.

Signed-off-by: Andy Lutomirski 
Link: 
http://lkml.kernel.org/r/d953cd2e70ceee1400985d091188cdd65fba2f05.1403129369.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/vdso/vdso2c.c | 42 +-
 arch/x86/vdso/vdso2c.h | 23 ---
 2 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/arch/x86/vdso/vdso2c.c b/arch/x86/vdso/vdso2c.c
index 7a6bf50..7343899 100644
--- a/arch/x86/vdso/vdso2c.c
+++ b/arch/x86/vdso/vdso2c.c
@@ -83,37 +83,21 @@ extern void bad_put_le(void);
 
 #define NSYMS (sizeof(required_syms) / sizeof(required_syms[0]))
 
-#define BITS 64
-#define GOFUNC go64
-#define Elf_Ehdr Elf64_Ehdr
-#define Elf_Shdr Elf64_Shdr
-#define Elf_Phdr Elf64_Phdr
-#define Elf_Sym Elf64_Sym
-#define Elf_Dyn Elf64_Dyn
+#define BITSFUNC3(name, bits) name##bits
+#define BITSFUNC2(name, bits) BITSFUNC3(name, bits)
+#define BITSFUNC(name) BITSFUNC2(name, ELF_BITS)
+
+#define ELF_BITS_XFORM2(bits, x) Elf##bits##_##x
+#define ELF_BITS_XFORM(bits, x) ELF_BITS_XFORM2(bits, x)
+#define ELF(x) ELF_BITS_XFORM(ELF_BITS, x)
+
+#define ELF_BITS 64
 #include "vdso2c.h"
-#undef BITS
-#undef GOFUNC
-#undef Elf_Ehdr
-#undef Elf_Shdr
-#undef Elf_Phdr
-#undef Elf_Sym
-#undef Elf_Dyn
-
-#define BITS 32
-#define GOFUNC go32
-#define Elf_Ehdr Elf32_Ehdr
-#define Elf_Shdr Elf32_Shdr
-#define Elf_Phdr Elf32_Phdr
-#define Elf_Sym Elf32_Sym
-#define Elf_Dyn Elf32_Dyn
+#undef ELF_BITS
+
+#define ELF_BITS 32
 #include "vdso2c.h"
-#undef BITS
-#undef GOFUNC
-#undef Elf_Ehdr
-#undef Elf_Shdr
-#undef Elf_Phdr
-#undef Elf_Sym
-#undef Elf_Dyn
+#undef ELF_BITS
 
 static void go(void *addr, size_t len, FILE *outfile, const char *name)
 {
diff --git a/arch/x86/vdso/vdso2c.h b/arch/x86/vdso/vdso2c.h
index c6eefaf..8e185ce 100644
--- a/arch/x86/vdso/vdso2c.h
+++ b/arch/x86/vdso/vdso2c.h
@@ -4,23 +4,24 @@
  * are built for 32-bit userspace.
  */
 
-static void GOFUNC(void *addr, size_t len, FILE *outfile, const char *name)
+static void BITSFUNC(go)(void *addr, size_t len,
+FILE *outfile, const char *name)
 {
int found_load = 0;
unsigned long load_size = -1;  /* Work around bogus warning */
unsigned long data_size;
-   Elf_Ehdr *hdr = (Elf_Ehdr *)addr;
+   ELF(Ehdr) *hdr = (ELF(Ehdr) *)addr;
int i;
unsigned long j;
-   Elf_Shdr *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
+   ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
*alt_sec = NULL;
-   Elf_Dyn *dyn = 0, *dyn_end = 0;
+   ELF(Dyn) *dyn = 0, *dyn_end = 0;
const char *secstrings;
uint64_t syms[NSYMS] = {};
 
uint64_t fake_sections_value = 0, fake_sections_size = 0;
 
-   Elf_Phdr *pt = (Elf_Phdr *)(addr + GET_LE(>e_phoff));
+   ELF(Phdr) *pt = (ELF(Phdr) *)(addr + GET_LE(>e_phoff));
 
/* Walk the segment table. */
for (i = 0; i < GET_LE(>e_phnum); i++) {
@@ -61,7 +62,7 @@ static void GOFUNC(void *addr, size_t len, FILE *outfile, 
const char *name)
GET_LE(>e_shentsize)*GET_LE(>e_shstrndx);
secstrings = addr + GET_LE(_hdr->sh_offset);
for (i = 0; i < GET_LE(>e_shnum); i++) {
-   Elf_Shdr *sh = addr + GET_LE(>e_shoff) +
+   ELF(Shdr) *sh = addr + GET_LE(>e_shoff) +
GET_LE(>e_shentsize) * i;
if (GET_LE(>sh_type) == SHT_SYMTAB)
symtab_hdr = sh;
@@ -82,7 +83,7 @@ static void GOFUNC(void *addr, size_t len, FILE *outfile, 
const char *name)
 i < GET_LE(_hdr->sh_size) / GET_LE(_hdr->sh_entsize);
 i++) {
int k;
-   Elf_Sym *sym = addr + GET_LE(_hdr->sh_offset) +
+   ELF(Sym) *sym = addr + GET_LE(_hdr->sh_offset) +
GET_LE(_hdr->sh_entsize) * i;
const char *name = addr + GET_LE(_hdr->sh_offset) +
GET_LE(>st_name);
@@ -123,12 +124,12 @@ static void GOFUNC(void *addr, size_t len, FILE *outfile, 
const char *name)
fail("end_mapping must be a multiple of 4096\n");
 
/* Remove sections or use fakes */
-   if (fake_sections_size % sizeof(Elf_Shdr))
+   if (fake_sections_size % sizeof(ELF(Shdr)))
fail("vdso_fake_sections size is not a multiple of %ld\n",
-(long)sizeof(Elf_Shdr));
+(long)sizeof(ELF(Shdr)));
PUT_LE(>e_shoff, fake_sections_value);
-   PUT_LE(>e_shentsize, fake_sections_value ? sizeof(Elf_Shdr) : 0);
-   

[tip:x86/urgent] x86/vdso: Discard the __bug_table section

2014-06-20 Thread tip-bot for Andy Lutomirski
Commit-ID:  5f56e7167e6d438324fcba87018255d81e201383
Gitweb: http://git.kernel.org/tip/5f56e7167e6d438324fcba87018255d81e201383
Author: Andy Lutomirski 
AuthorDate: Wed, 18 Jun 2014 15:59:46 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 19 Jun 2014 15:44:51 -0700

x86/vdso: Discard the __bug_table section

It serves no purpose in user code.

Signed-off-by: Andy Lutomirski 
Link: 
http://lkml.kernel.org/r/2a5bebff42defd8a5e81d96f7dc00f21143c80e8.1403129369.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/vdso/vdso-layout.lds.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/vdso/vdso-layout.lds.S b/arch/x86/vdso/vdso-layout.lds.S
index 2ec72f6..c84166c 100644
--- a/arch/x86/vdso/vdso-layout.lds.S
+++ b/arch/x86/vdso/vdso-layout.lds.S
@@ -75,6 +75,7 @@ SECTIONS
/DISCARD/ : {
*(.discard)
*(.discard.*)
+   *(__bug_table)
}
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/mce: Don't unregister CPU hotplug notifier in error path

2014-06-20 Thread Boris Ostrovsky

On 06/20/2014 05:11 PM, Borislav Petkov wrote:

On Fri, Jun 20, 2014 at 04:43:37PM -0400, Boris Ostrovsky wrote:

We are getting CPU_ONLINE notifier for ASPs during boot:

Bah, that's craptastic. Hmm, ok, let's try this instead:


I'll try it later but this doesn't look sufficient to me: we might not 
reach this point if subsys_system_register() or zalloc_cpumask_var() 
fail.  We could register the notifier as the first thing in this routine 
(probably after mce_available() succeeds).



-boris



--
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index bb92f38153b2..9a79c8dbd8e8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2451,6 +2451,12 @@ static __init int mcheck_init_device(void)
for_each_online_cpu(i) {
err = mce_device_create(i);
if (err) {
+   /*
+* Register notifier anyway (and do not unreg it) so
+* that we don't leave undeleted timers, see notifier
+* callback above.
+*/
+   __register_hotcpu_notifier(_cpu_notifier);
cpu_notifier_register_done();
goto err_device_create;
}
@@ -2471,10 +2477,6 @@ static __init int mcheck_init_device(void)
  err_register:
unregister_syscore_ops(_syscore_ops);
  
-	cpu_notifier_register_begin();

-   __unregister_hotcpu_notifier(_cpu_notifier);
-   cpu_notifier_register_done();
-
  err_device_create:
/*
 * We didn't keep track of which devices were created above, but



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] openrisc: irq: use irqchip framework

2014-06-20 Thread Jason Cooper
On Thu, May 29, 2014 at 11:28:08PM +0300, Stefan Kristiansson wrote:
> On Tue, May 27, 2014 at 08:47:36AM +0200, Jonas Bonn wrote:
> > On 05/26/2014 10:52 PM, Geert Uytterhoeven wrote:
> > > CC devicetree for the bindings
> > > 
> > > On Mon, May 26, 2014 at 10:31 PM, Stefan Kristiansson
> > >  wrote:
> > >> +++ 
> > >> b/Documentation/devicetree/bindings/interrupt-controller/opencores,or1k-pic.txt
> > >> @@ -0,0 +1,23 @@
> > >> +OpenRISC 1000 Programmable Interrupt Controller
> > >> +
> > >> +Required properties:
> > >> +
> > >> +- compatible : should be "opencores,or1k-pic-level" for variants with
> > >> +  level triggered interrupt lines, "opencores,or1k-pic-edge" for 
> > >> variants with
> > >> +  edge triggered interrupt lines or "opencores,or1200-pic" for machines
> > >> +  with the non-spec compliant or1200 type implementation.
> > >> +
> > >> +  "opencores,or1k-pic" is also provided as an alias to 
> > >> "opencores,or1200-pic",
> > >> +  but this is only for backwards compatibility.
> > 
> > I still think this identifier needs to be versioned.  Use the same
> > version number as we have on the cpu identifier since the OR1200 PIC
> > hasn't changed since then; i.e. opencores,or1200-pic-rtlsvnXYZ.
> > 
> 
> I can change that if you *really* insist on it...
> But I don't understand the purpose of the versioning here,
> there will never be any other or1200-pic version than the one that currently
> exists, so IMO "or1200" should be enough versioning information.

I'm horribly unfamiliar with openrisc, but compatible strings are
compatible strings. ;-)

Is the *actual* IP block called or1200-pic?  Or is it, eg or1235-pic, and
you're using or1200-pic as a generic catch-all?

Please use the specific IP name without wildcards.  That compatible
string will then be used on that IP and future IP that is compatible
with the original IP.  Once an incompatible change is introduced, then
we'll create a new compatible string, say or1300-pic, or or1237-pic.

When in doubt, be specific.  I don't think the '-rtlsvnXYZ' should be
necessary, though.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SmPL for automatic request_firmware_nowait() conversion

2014-06-20 Thread Luis R. Rodriguez
I was just porting over an ethernet driver [0] to use request_firmware_nowait()
since firmware loading seems can take over a minute on one device, while
at it I noticed no other ethernet drivers yet use this API so figure
this may be a trend coming if devices are getting as complex as cxgb4.
The cxgb4 driver happens to even use the firmware API 3 times!

Obviously I considered writing SmPL for this, but one thing which seemed
hard was that for after the request_firmware_nowait() we tend to tuck
away into another new call the rest of the code that was in place in the
original function after the old request_firmware() call. Is there a way
to dump all that code into the new routine? I think the hardest thing
would be to also move the right set of variables over. In the third
patch in this series for example [1] there was a state variable that
I moved from beign static over to the ethernet private data structure.
Its hard for me to think of how I can hint to Coccinelle enough information
about what stuff it needs to move around. I think one hint would be:

  "Hey all that code that is static and is used *before* and *after* 
request_firmware()
   stuff it into the private data structure"

We'd have to infer the private data structure but that's easy and I already know
that's possible. Is this possible? The only other challenge I thought
might be tough would be to come up with are rasonable call for the
completion call, but I guess we can use the original routine name
where request_firmware() was being used and postfix _completion or something.

netdev: how worthy is this effort?

[0] https://lkml.org/lkml/2014/6/20/688
[1] https://lkml.org/lkml/2014/6/20/691
 
  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging: ft1000_dnld.c:code indent should use tabs where possible

2014-06-20 Thread Cheng-Wei Lee
This patch fixes the following checkpatch.pl issue in
ft1000/ft1000-pcmcia/ft1000_dnld.c
ERROR: code indent should use tabs where possible

Signed-off-by: Quentin Lee 
---
 drivers/staging/ft1000/ft1000-pcmcia/ft1000_dnld.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/ft1000/ft1000-pcmcia/ft1000_dnld.c
b/drivers/staging/ft1000/ft1000-pcmcia/ft1000_dnld.c
index d44e858..afaab07 100644
--- a/drivers/staging/ft1000/ft1000-pcmcia/ft1000_dnld.c
+++ b/drivers/staging/ft1000/ft1000-pcmcia/ft1000_dnld.c
@@ -15,8 +15,8 @@
Suite 330, Boston, MA 02111-1307, USA.
   --

-   Description:  This module will handshake with the DSP bootloader to
- download the DSP runtime image.
+   Description: This module will handshake with the DSP bootloader to
+   download the DSP runtime image.

 ---*/

-- 
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [for-next][PATCH v2 1/3] tracing: Fix syscall_*regfunc() vs copy_process() race

2014-06-20 Thread Steven Rostedt
On Fri, 20 Jun 2014 18:11:25 -0700
"Paul E. McKenney"  wrote:

> On Fri, Jun 20, 2014 at 06:45:19AM -0400, Steven Rostedt wrote:
> > From: Oleg Nesterov 
> > 
> > syscall_regfunc() and syscall_unregfunc() should set/clear
> > TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
> > with copy_process() and miss the new child which was not added to
> > the process/thread lists yet.
> > 
> > Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
> > under tasklist.
> > 
> > Link: http://lkml.kernel.org/p/20140413185854.gb20...@redhat.com
> > 
> > Cc: sta...@vger.kernel.org # 2.6.33
> > Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints"
> > Acked-by: Frederic Weisbecker 
> > Signed-off-by: Oleg Nesterov 
> > Signed-off-by: Steven Rostedt 
> 
> Acked-by: Paul E. McKenney 
> 

I don't usually rebase my for-next branch for acks, but I already
rebased once for fixing an issue, and it's early in the rc cycle, and
this is the first patch on the branch, so I think I will do it.

Thanks!

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [for-next][PATCH v2 1/3] tracing: Fix syscall_*regfunc() vs copy_process() race

2014-06-20 Thread Paul E. McKenney
On Fri, Jun 20, 2014 at 06:45:19AM -0400, Steven Rostedt wrote:
> From: Oleg Nesterov 
> 
> syscall_regfunc() and syscall_unregfunc() should set/clear
> TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
> with copy_process() and miss the new child which was not added to
> the process/thread lists yet.
> 
> Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
> under tasklist.
> 
> Link: http://lkml.kernel.org/p/20140413185854.gb20...@redhat.com
> 
> Cc: sta...@vger.kernel.org # 2.6.33
> Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints"
> Acked-by: Frederic Weisbecker 
> Signed-off-by: Oleg Nesterov 
> Signed-off-by: Steven Rostedt 

Acked-by: Paul E. McKenney 

> ---
>  include/trace/syscall.h | 15 +++
>  kernel/fork.c   |  2 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index fed853f3d7aa..9674145e2f6a 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -32,4 +33,18 @@ struct syscall_metadata {
>   struct ftrace_event_call *exit_event;
>  };
>  
> +#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_HAVE_SYSCALL_TRACEPOINTS)
> +static inline void syscall_tracepoint_update(struct task_struct *p)
> +{
> + if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
> + set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
> + else
> + clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
> +}
> +#else
> +static inline void syscall_tracepoint_update(struct task_struct *p)
> +{
> +}
> +#endif
> +
>  #endif /* _TRACE_SYSCALL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d2799d1fc952..6a13c46cd87d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1487,7 +1487,9 @@ static struct task_struct *copy_process(unsigned long 
> clone_flags,
>  
>   total_forks++;
>   spin_unlock(>sighand->siglock);
> + syscall_tracepoint_update(p);
>   write_unlock_irq(_lock);
> +
>   proc_fork_connector(p);
>   cgroup_post_fork(p);
>   if (clone_flags & CLONE_THREAD)
> -- 
> 2.0.0
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/13] mm: memcontrol: rewrite uncharge API

2014-06-20 Thread Sasha Levin
On 06/20/2014 08:56 PM, Andrew Morton wrote:
> On Fri, 20 Jun 2014 20:34:43 -0400 Sasha Levin  wrote:
> 
>> I'm seeing the following when booting a VM, bisection pointed me to this
>> patch.
>>
>> [   32.830823] BUG: using __this_cpu_add() in preemptible [] code: 
>> mkdir/8677
> 
> Thanks.  This one was fixed earlier today.

Thank Andrew. My first bisection attempt went sideways and ended up
pointing at "fs/mpage.c: forgotten WRITE_SYNC in case of data integrity write"
for some reason.

My attempt to understand what data integrity has to do cgroups was unfruitful :(


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/13] mm: memcontrol: rewrite uncharge API

2014-06-20 Thread Andrew Morton
On Fri, 20 Jun 2014 20:34:43 -0400 Sasha Levin  wrote:

> I'm seeing the following when booting a VM, bisection pointed me to this
> patch.
> 
> [   32.830823] BUG: using __this_cpu_add() in preemptible [] code: 
> mkdir/8677

Thanks.  This one was fixed earlier today.

From: Michal Hocko 
Subject: memcg: mem_cgroup_charge_statistics needs preempt_disable

preempt_disable was previously disabled by lock_page_cgroup which has been
removed by "mm: memcontrol: rewrite uncharge API".

This fixes the a flood of splats like this:
[3.149371] BUG: using __this_cpu_add() in preemptible [] code: 
udevd/1271
[3.151458] caller is __this_cpu_preempt_check+0x13/0x15
[3.152927] CPU: 0 PID: 1271 Comm: udevd Not tainted 3.15.0-test1 #366
[3.154637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
[3.156788]   8805fba8 814efe3f 

[3.158810]  8805fbd8 8125b969 880007413448 
0001
[3.160836]  ea1e8c00 0001 8805fbe8 
8125b9a8
[3.162950] Call Trace:
[3.163598]  [] dump_stack+0x4e/0x7a
[3.164942]  [] check_preemption_disabled+0xd2/0xe5
[3.166618]  [] __this_cpu_preempt_check+0x13/0x15
[3.168267]  [] 
mem_cgroup_charge_statistics.isra.36+0xb5/0xc6
[3.170169]  [] commit_charge+0x23c/0x256
[3.171823]  [] mem_cgroup_commit_charge+0xb8/0xd7
[3.173838]  [] shmem_getpage_gfp+0x399/0x605
[3.175363]  [] shmem_write_begin+0x3d/0x58
[3.176854]  [] generic_perform_write+0xbc/0x192
[3.178445]  [] ? file_update_time+0x34/0xac
[3.179952]  [] __generic_file_aio_write+0x2c0/0x300
[3.181655]  [] generic_file_aio_write+0x52/0xbd
[3.183234]  [] do_sync_write+0x59/0x78
[3.184630]  [] vfs_write+0xc4/0x181
[3.185957]  [] SyS_write+0x4a/0x91
[3.187258]  [] tracesys+0xd0/0xd5

Signed-off-by: Michal Hocko 
Cc: Johannes Weiner 
Signed-off-by: Andrew Morton 
---

 mm/memcontrol.c |3 +++
 1 file changed, 3 insertions(+)

diff -puN mm/memcontrol.c~mm-memcontrol-rewrite-uncharge-api-fix-4 
mm/memcontrol.c
--- a/mm/memcontrol.c~mm-memcontrol-rewrite-uncharge-api-fix-4
+++ a/mm/memcontrol.c
@@ -904,6 +904,8 @@ static void mem_cgroup_charge_statistics
 struct page *page,
 int nr_pages)
 {
+   preempt_disable();
+
/*
 * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
 * counted as CACHE even if it's on ANON LRU.
@@ -928,6 +930,7 @@ static void mem_cgroup_charge_statistics
}
 
__this_cpu_add(memcg->stat->nr_page_events, nr_pages);
+   preempt_enable();
 }
 
 unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scsi-mq

2014-06-20 Thread Elliott, Robert (Server Storage)


> -Original Message-
> From: Bart Van Assche [mailto:bvanass...@acm.org]
> Sent: Wednesday, 18 June, 2014 2:09 AM
> To: Jens Axboe; Christoph Hellwig; James Bottomley
> Cc: Elliott, Robert (Server Storage); linux-s...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: scsi-mq
> 
...
> Hello Jens,
> 
> Fio reports the same queue depth for use_blk_mq=Y (mq below) and
> use_blk_mq=N (sq below), namely ">=64". However, the number of context
> switches differs significantly for the random read-write tests.
> 
...
> It seems like with the traditional SCSI mid-layer and block core (sq)
> that the number of context switches does not depend too much on the
> number of I/O operations but that for the multi-queue SCSI core there
> are a little bit more than two context switches per I/O in the
> particular test I ran. The "randrw" script I used for this test takes
> SCSI LUNs as arguments (/dev/sdX) and starts the fio tool as follows:

Some of those context switches might be from scsi_end_request(), 
which always schedules the scsi_requeue_run_queue() function via the
requeue_work workqueue for scsi-mq.  That causes lots of context 
switches from a busy application thread (e.g., fio) to a 
kworker thread.

As shown by ftrace:

 fio-19340 [005] dNh. 12067.908444: scsi_io_completion 
<-scsi_finish_command
 fio-19340 [005] dNh. 12067.908444: scsi_end_request 
<-scsi_io_completion
 fio-19340 [005] dNh. 12067.908444: blk_update_request 
<-scsi_end_request
 fio-19340 [005] dNh. 12067.908445: blk_account_io_completion 
<-blk_update_request
 fio-19340 [005] dNh. 12067.908445: scsi_mq_free_sgtables 
<-scsi_end_request
 fio-19340 [005] dNh. 12067.908445: scsi_free_sgtable 
<-scsi_mq_free_sgtables
 fio-19340 [005] dNh. 12067.908445: blk_account_io_done 
<-__blk_mq_end_io
 fio-19340 [005] dNh. 12067.908445: blk_mq_free_request 
<-__blk_mq_end_io
 fio-19340 [005] dNh. 12067.908446: blk_mq_map_queue 
<-blk_mq_free_request
 fio-19340 [005] dNh. 12067.908446: blk_mq_put_tag 
<-__blk_mq_free_request
 fio-19340 [005] .N.. 12067.908446: blkdev_direct_IO 
<-generic_file_direct_write
kworker/5:1H-3207  [005]  12067.908448: scsi_requeue_run_queue 
<-process_one_work
kworker/5:1H-3207  [005]  12067.908448: scsi_run_queue 
<-scsi_requeue_run_queue
kworker/5:1H-3207  [005]  12067.908448: blk_mq_start_stopped_hw_queues 
<-scsi_run_queue
 fio-19340 [005]  12067.908449: blk_start_plug 
<-do_blockdev_direct_IO
 fio-19340 [005]  12067.908449: blkdev_get_block <-do_direct_IO
 fio-19340 [005]  12067.908450: blk_throtl_bio 
<-generic_make_request_checks
 fio-19340 [005]  12067.908450: blk_sq_make_request 
<-generic_make_request
 fio-19340 [005]  12067.908450: blk_queue_bounce 
<-blk_sq_make_request
 fio-19340 [005]  12067.908450: blk_mq_map_request 
<-blk_sq_make_request
 fio-19340 [005]  12067.908451: blk_mq_queue_enter 
<-blk_mq_map_request
 fio-19340 [005]  12067.908451: blk_mq_map_queue 
<-blk_mq_map_request
 fio-19340 [005]  12067.908451: blk_mq_get_tag 
<-__blk_mq_alloc_request
 fio-19340 [005]  12067.908451: blk_mq_bio_to_request 
<-blk_sq_make_request
 fio-19340 [005]  12067.908451: blk_rq_bio_prep 
<-init_request_from_bio
 fio-19340 [005]  12067.908451: blk_recount_segments 
<-bio_phys_segments
 fio-19340 [005]  12067.908452: blk_account_io_start 
<-blk_mq_bio_to_request
 fio-19340 [005]  12067.908452: blk_mq_hctx_mark_pending 
<-__blk_mq_insert_request
 fio-19340 [005]  12067.908452: blk_mq_run_hw_queue 
<-blk_sq_make_request
 fio-19340 [005]  12067.908452: blk_mq_start_request 
<-__blk_mq_run_hw_queue

In one snapshot just tracing scsi_end_request() and
scsi_request_run_queue(), 30K scsi_end_request() calls yielded 
20k scsi_request_run_queue() calls.

In this case, blk_mq_start_stopped_hw_queues() doesn't end up
doing anything since there aren't any stopped queues to restart 
(blk_mq_run_hw_queue() gets called a bit later during routine 
fio work); the context switch turned out to be a waste of time.  
If it did find a stopped queue, then it would call 
blk_mq_run_hw_queue() itself.

---
Rob ElliottHP Server Storage

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFT 1/3] cxgb4: make ethtool set_flash use request_firmware_nowait()

2014-06-20 Thread Luis R. Rodriguez
From: "Luis R. Rodriguez" 

cxgb4 loading can take a while, this is part of the crusade to
change it to be asynchronous.

Cc: Casey Leedom 
Cc: Hariprasad Shenai 
Cc: Philip Oswald 
Cc: Santosh Rastapur 
Cc: Jeffrey Cheung 
Cc: David Chang 
Signed-off-by: Luis R. Rodriguez 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  3 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 40 -
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index f503dce..bcf9acf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -647,6 +647,9 @@ struct adapter {
struct dentry *debugfs_root;
 
spinlock_t stats_lock;
+
+   struct completion flash_comp;
+   int flash_comp_status;
 };
 
 /* Defined bit width of user definable filter tuples
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 2f8d6b9..9cf6f3e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2713,22 +2713,48 @@ out:
return err;
 }
 
+static void cxgb4_flash_complete(const struct firmware *fw, void *context)
+{
+   struct adapter *adap = context;
+   int ret;
+
+   if (!fw) {
+   adap->flash_comp_status = -EINVAL;
+   goto out;
+   }
+
+   ret = t4_load_fw(adap, fw->data, fw->size);
+   if (!ret)
+   adap->flash_comp_status = ret;
+
+out:
+   release_firmware(fw);
+   complete(>flash_comp);
+}
+
 static int set_flash(struct net_device *netdev, struct ethtool_flash *ef)
 {
int ret;
-   const struct firmware *fw;
struct adapter *adap = netdev2adap(netdev);
 
+   init_completion(>flash_comp);
+   adap->flash_comp_status = 0;
+
ef->data[sizeof(ef->data) - 1] = '\0';
-   ret = request_firmware(, ef->data, adap->pdev_dev);
+   ret = request_firmware_nowait(THIS_MODULE, 1, ef->data,
+ adap->pdev_dev, GFP_KERNEL,
+ adap, cxgb4_flash_complete);
if (ret < 0)
return ret;
 
-   ret = t4_load_fw(adap, fw->data, fw->size);
-   release_firmware(fw);
-   if (!ret)
-   dev_info(adap->pdev_dev, "loaded firmware %s\n", ef->data);
-   return ret;
+   wait_for_completion(>flash_comp);
+
+   if (adap->flash_comp_status != 0)
+   return adap->flash_comp_status;
+
+   dev_info(adap->pdev_dev, "loaded firmware %s\n", ef->data);
+
+   return 0;
 }
 
 #define WOL_SUPPORTED (WAKE_BCAST | WAKE_MAGIC)
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFT 0/3] cxgb4: use request_firmware_nowait()

2014-06-20 Thread Luis R. Rodriguez
From: "Luis R. Rodriguez" 

Its reported that loading the cxgb4 can take over 1 minute,
use the more sane request_firmware_nowait() API call just
in case this amount of time is causing issues. The driver
uses the firmware API 3 times, one for the firmware, one
for configuration and another one for flash, this provides
the port for all cases.

I don't have the hardware so please test. I did verify we
can use this during pci probe and also during the ethtool
flash callback.

Luis R. Rodriguez (3):
  cxgb4: make ethtool set_flash use request_firmware_nowait()
  cxgb4: make configuration load use request_firmware_nowait()
  cxgb4: make device firmware load use request_firmware_nowait()

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  13 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 258 +++-
 2 files changed, 176 insertions(+), 95 deletions(-)

-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFT 3/3] cxgb4: make device firmware load use request_firmware_nowait()

2014-06-20 Thread Luis R. Rodriguez
From: "Luis R. Rodriguez" 

cxgb4 loading can take a while, this ends the crusade to
change it to be asynchronous.

Cc: Casey Leedom 
Cc: Hariprasad Shenai 
Cc: Philip Oswald 
Cc: Santosh Rastapur 
Cc: Jeffrey Cheung 
Cc: David Chang 
Signed-off-by: Luis R. Rodriguez 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |   6 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 105 ++--
 2 files changed, 67 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 1507dc2..89296f1 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -654,6 +654,12 @@ struct adapter {
char fw_config_file[32];
struct completion config_comp;
int config_comp_status;
+
+   struct fw_info *fw_info;
+   struct completion fw_comp;
+   int fw_comp_status;
+   enum dev_state state;
+   int reset;
 };
 
 /* Defined bit width of user definable filter tuples
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 65e4124..105b83a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5341,6 +5341,39 @@ static struct fw_info *find_fw_info(int chip)
return NULL;
 }
 
+static void cxgb4_fw_complete(const struct firmware *fw, void *context)
+{
+   struct adapter *adap = context;
+   struct fw_hdr *card_fw;
+   const u8 *fw_data = NULL;
+   unsigned int fw_size = 0;
+
+   /* allocate memory to read the header of the firmware on the
+* card
+*/
+   card_fw = t4_alloc_mem(sizeof(*card_fw));
+
+   if (!fw) {
+   dev_err(adap->pdev_dev,
+   "unable to load firmware image %s\n",
+   adap->fw_info->fw_mod_name);
+   } else {
+   fw_data = fw->data;
+   fw_size = fw->size;
+   }
+
+   /* upgrade FW logic */
+   adap->fw_comp_status = t4_prep_fw(adap, adap->fw_info, fw_data,
+ fw_size, card_fw, adap->state,
+ >reset);
+
+   /* Cleaning up */
+   if (fw != NULL)
+   release_firmware(fw);
+   t4_free_mem(card_fw);
+   complete(>fw_comp);
+}
+
 /*
  * Phase 0 of initialization: contact FW, obtain config, perform basic init.
  */
@@ -5348,10 +5381,10 @@ static int adap_init0(struct adapter *adap)
 {
int ret;
u32 v, port_vec;
-   enum dev_state state;
u32 params[7], val[7];
struct fw_caps_config_cmd caps_cmd;
-   int reset = 1;
+
+   adap->reset = 1;
 
/*
 * Contact FW, advertising Master capability (and potentially forcing
@@ -5360,7 +5393,7 @@ static int adap_init0(struct adapter *adap)
 */
ret = t4_fw_hello(adap, adap->mbox, adap->fn,
  force_init ? MASTER_MUST : MASTER_MAY,
- );
+ >state);
if (ret < 0) {
dev_err(adap->pdev_dev, "could not connect to FW, error %d\n",
ret);
@@ -5368,8 +5401,8 @@ static int adap_init0(struct adapter *adap)
}
if (ret == adap->mbox)
adap->flags |= MASTER_PF;
-   if (force_init && state == DEV_STATE_INIT)
-   state = DEV_STATE_UNINIT;
+   if (force_init && adap->state == DEV_STATE_INIT)
+   adap->state = DEV_STATE_UNINIT;
 
/*
 * If we're the Master PF Driver and the device is uninitialized,
@@ -5380,51 +5413,34 @@ static int adap_init0(struct adapter *adap)
 */
t4_get_fw_version(adap, >params.fw_vers);
t4_get_tp_version(adap, >params.tp_vers);
-   if ((adap->flags & MASTER_PF) && state != DEV_STATE_INIT) {
-   struct fw_info *fw_info;
-   struct fw_hdr *card_fw;
-   const struct firmware *fw;
-   const u8 *fw_data = NULL;
-   unsigned int fw_size = 0;
+   if ((adap->flags & MASTER_PF) && adap->state != DEV_STATE_INIT) {
+   init_completion(>fw_comp);
+   adap->fw_comp_status = 0;
 
/* This is the firmware whose headers the driver was compiled
 * against
 */
-   fw_info = find_fw_info(CHELSIO_CHIP_VERSION(adap->params.chip));
-   if (fw_info == NULL) {
+   adap->fw_info =
+   find_fw_info(CHELSIO_CHIP_VERSION(adap->params.chip));
+   if (adap->fw_info == NULL) {
dev_err(adap->pdev_dev,
"unable to get firmware info for chip %d.\n",
CHELSIO_CHIP_VERSION(adap->params.chip));
return -EINVAL;
}
 
-   /* allocate memory to 

[RFT 2/3] cxgb4: make configuration load use request_firmware_nowait()

2014-06-20 Thread Luis R. Rodriguez
From: "Luis R. Rodriguez" 

cxgb4 loading can take a while, this is part of the crusade to
change it to be asynchronous. One more to go.

Cc: Philip Oswald 
Cc: Santosh Rastapur 
Cc: Jeffrey Cheung 
Cc: David Chang 
Cc: Casey Leedom 
Cc: Hariprasad Shenai 
Signed-off-by: Luis R. Rodriguez 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |   4 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 113 +++-
 2 files changed, 73 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index bcf9acf..1507dc2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -650,6 +650,10 @@ struct adapter {
 
struct completion flash_comp;
int flash_comp_status;
+
+   char fw_config_file[32];
+   struct completion config_comp;
+   int config_comp_status;
 };
 
 /* Defined bit width of user definable filter tuples
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 9cf6f3e..65e4124 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4827,51 +4827,18 @@ static int adap_init0_tweaks(struct adapter *adapter)
return 0;
 }
 
-/*
- * Attempt to initialize the adapter via a Firmware Configuration File.
- */
-static int adap_init0_config(struct adapter *adapter, int reset)
+static void cxgb4_config_complete(const struct firmware *cf, void *context)
 {
-   struct fw_caps_config_cmd caps_cmd;
-   const struct firmware *cf;
+   struct adapter *adapter = context;
unsigned long mtype = 0, maddr = 0;
u32 finiver, finicsum, cfcsum;
-   int ret;
-   int config_issued = 0;
-   char *fw_config_file, fw_config_file_path[256];
char *config_name = NULL;
+   struct fw_caps_config_cmd caps_cmd;
+   int config_issued = 0;
+   int ret = 0;
+   char fw_config_file_path[256];
 
-   /*
-* Reset device if necessary.
-*/
-   if (reset) {
-   ret = t4_fw_reset(adapter, adapter->mbox,
- PIORSTMODE | PIORST);
-   if (ret < 0)
-   goto bye;
-   }
-
-   /*
-* If we have a T4 configuration file under /lib/firmware/cxgb4/,
-* then use that.  Otherwise, use the configuration file stored
-* in the adapter flash ...
-*/
-   switch (CHELSIO_CHIP_VERSION(adapter->params.chip)) {
-   case CHELSIO_T4:
-   fw_config_file = FW4_CFNAME;
-   break;
-   case CHELSIO_T5:
-   fw_config_file = FW5_CFNAME;
-   break;
-   default:
-   dev_err(adapter->pdev_dev, "Device %d is not supported\n",
-  adapter->pdev->device);
-   ret = -EINVAL;
-   goto bye;
-   }
-
-   ret = request_firmware(, fw_config_file, adapter->pdev_dev);
-   if (ret < 0) {
+   if (!cf) {
config_name = "On FLASH";
mtype = FW_MEMTYPE_CF_FLASH;
maddr = t4_flash_cfg_addr(adapter);
@@ -4879,7 +4846,7 @@ static int adap_init0_config(struct adapter *adapter, int 
reset)
u32 params[7], val[7];
 
sprintf(fw_config_file_path,
-   "/lib/firmware/%s", fw_config_file);
+   "/lib/firmware/%s", adapter->fw_config_file);
config_name = fw_config_file_path;
 
if (cf->size >= FLASH_CFG_MAX_SIZE)
@@ -4898,7 +4865,7 @@ static int adap_init0_config(struct adapter *adapter, int 
reset)
 * to write that out separately since we can't
 * guarantee that the bytes following the
 * residual byte in the buffer returned by
-* request_firmware() are zeroed out ...
+* request_firmware_nowait() are zeroed out ...
 */
size_t resid = cf->size & 0x3;
size_t size = cf->size & ~0x3;
@@ -5018,7 +4985,8 @@ static int adap_init0_config(struct adapter *adapter, int 
reset)
dev_info(adapter->pdev_dev, "Successfully configured using Firmware "\
 "Configuration File \"%s\", version %#x, computed checksum 
%#x\n",
 config_name, finiver, cfcsum);
-   return 0;
+   complete(>config_comp);
+   return;
 
/*
 * Something bad happened.  Return the error ...  (If the "error"
@@ -5026,10 +4994,67 @@ static int adap_init0_config(struct adapter *adapter, 
int reset)
 * want to issue a warning since this is fairly common.)
 */
 bye:
+   adapter->flash_comp_status = ret;
if (config_issued && ret != -ENOENT)
  

Re: [PATCH 1/4] cfq: Increase default value of target_latency

2014-06-20 Thread Dave Chinner
On Fri, Jun 20, 2014 at 12:30:25PM +0100, Mel Gorman wrote:
> On Fri, Jun 20, 2014 at 07:42:14AM +1000, Dave Chinner wrote:
> > On Thu, Jun 19, 2014 at 02:38:44PM -0400, Jeff Moyer wrote:
> > > Mel Gorman  writes:
> > > 
> > > > The existing CFQ default target_latency results in very poor performance
> > > > for larger numbers of threads doing sequential reads.  While this can be
> > > > easily described as a tuning problem for users, it is one that is tricky
> > > > to detect. This patch the default on the assumption that people with 
> > > > access
> > > > to expensive fast storage also know how to tune their IO scheduler.
> > > >
> > > > The following is from tiobench run on a mid-range desktop with a single
> > > > spinning disk.
> > > >
> > > >   3.16.0-rc13.16.0-rc1  
> > > >3.0.0
> > > >  vanilla  cfq600
> > > >  vanilla
> > > > Mean   SeqRead-MB/sec-1 121.88 (  0.00%)  121.60 ( -0.23%)  
> > > > 134.59 ( 10.42%)
> > > > Mean   SeqRead-MB/sec-2 101.99 (  0.00%)  102.35 (  0.36%)  
> > > > 122.59 ( 20.20%)
> > > > Mean   SeqRead-MB/sec-4  97.42 (  0.00%)   99.71 (  2.35%)  
> > > > 114.78 ( 17.82%)
> > > > Mean   SeqRead-MB/sec-8  83.39 (  0.00%)   90.39 (  8.39%)  
> > > > 100.14 ( 20.09%)
> > > > Mean   SeqRead-MB/sec-16 68.90 (  0.00%)   77.29 ( 12.18%)  
> > > >  81.64 ( 18.50%)
> > > 
> > > Did you test any workloads other than this?  Also, what normal workload
> > > has 8 or more threads doing sequential reads?  (That's an honest
> > > question.)
> > 
> > I'd also suggest that making changes basd on the assumption that
> > people affected by the change know how to tune CFQ is a bad idea.
> > When CFQ misbehaves, most people just switch to deadline or no-op
> > because they don't understand how CFQ works, nor what what all the
> > nobs do or which ones to tweak to solve their problem
> 
> Ok, that's fair enough. Tuning CFQ is tricky but as it is, the default
> performance is not great in comparison to older kernels and it's something
> that has varied considerably over time. I'm surprised there have not been
> more complaints but maybe I just missed them on the lists.

That's because there are widespread recommendations not to use CFQ
if you have any sort of significant storage or IO workload. We
specifically recommend that you don't use CFQ with XFS
because it does not play nicely with correlated multi-process
IO. This is something that happens a lot, even with single threaded
workloads.

e.g. a single fsync can issue dependent IOs from multiple
process contexts - the syscall process for data IO, the allocation
workqueue kworker for btree blocks, the xfsaild to push metadata to
disk to make space available for the allocation transaction, and
then the journal IO from the xfs log workqueue kworker.

There's 4 IOs, all from different process contexts, all of which
need to be dispatched and completed with the minimum of latency.
With CFQ adding scheduling and idling delays in the middle of this,
it tends to leave disks idle when they really should be doing work.

We also don't recommend using CFQ when you have hardware raid with
caches, because the HW RAID does a much, much better job of
optimising and prioritising IO through it's cache. Idling is
wrong if the cache has hardware readahead, because most subsequent
read IOs will hit the hardware cache. Hence you could be dispatching
other IO instead of idling, yet still get minimal IO latency  across
multiple streams of different read workloads.

Hence people search on CFQ problems, see the "use deadline"
recommendations, change to deadline and see there IO workload going
faster. So they shrug their shoulders, set deadline as the
default, and move on to the next problem...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 0/5] Fix for cond_resched performance regression

2014-06-20 Thread Paul E. McKenney
On Fri, Jun 20, 2014 at 05:14:18PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 20, 2014 at 04:52:15PM -0700, j...@joshtriplett.org wrote:
> > On Fri, Jun 20, 2014 at 04:30:33PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 20, 2014 at 03:39:51PM -0700, j...@joshtriplett.org wrote:
> > > > On Fri, Jun 20, 2014 at 03:11:20PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 20, 2014 at 02:24:23PM -0700, j...@joshtriplett.org wrote:
> > > > > > On Fri, Jun 20, 2014 at 12:12:36PM -0700, Paul E. McKenney wrote:
> > > > > > > o Make cond_resched() a no-op for PREEMPT=y.  This might well turn
> > > > > > >   out to be a good thing, but it doesn't help give RCU the 
> > > > > > > quiescent
> > > > > > >   states that it needs.
> > > > > > 
> > > > > > What about doing this, together with letting the fqs logic poke
> > > > > > un-quiesced kernel code as needed?  That way, rather than having
> > > > > > cond_resched do any work, you have the fqs logic recognize that a
> > > > > > particular CPU has gone too long without quiescing, without 
> > > > > > disturbing
> > > > > > that CPU at all if it hasn't gone too long.
> > > > > 
> > > > > My next stop is to post the previous series, but with a couple of
> > > > > exports and one bug fix uncovered by testing thus far, but after
> > > > > another round of testing.  Then I am going to take a close look at
> > > > > this one:
> > > > > 
> > > > > o Push the checks further into cond_resched(), so that the
> > > > >   fastpath does the same sequence of instructions that the 
> > > > > original
> > > > >   did.  This might work well, but requires IPIs, which are not so
> > > > >   good for latencies on the remote CPU.  It nevertheless might be 
> > > > > a
> > > > >   decent long-term solution given that if your CPU is spending 
> > > > > many
> > > > >   jiffies looping in the kernel, you aren't getting good latencies
> > > > >   anyway.  It also has the benefit of allowing RCU to take 
> > > > > advantage
> > > > >   of the implicit quiescent states of all cond_resched() calls,
> > > > >   and of eliminating the need for a separate cond_resched_rcu_qs()
> > > > >   and for RCU_COND_RESCHED_QS.
> > > > > 
> > > > > The one you call out is of course interesting as well.  But there are
> > > > > a couple of questions:
> > > > > 
> > > > > 1.Why wasn't cond_resched() a no-op in CONFIG_PREEMPT to start
> > > > >   with?  It just seems to obvious a thing to do for it to possibly
> > > > >   be an oversight.  (What, me paranoid?)
> > > > > 
> > > > > 2.When RCU recognizes that a particular CPU has gone too long,
> > > > >   exactly what are you suggesting that RCU do about it?  When
> > > > >   formulating your answer, please give due consideration to the
> > > > >   implications of that CPU being a NO_HZ_FULL CPU.  ;-)
> > > > 
> > > > Send it an IPI that either causes it to flag a quiescent state
> > > > immediately if currently quiesced or causes it to quiesce at the next
> > > > opportunity if not.
> > > 
> > > OK.  But if we are in a !PREEMPT kernel,
> > 
> > That's not the case I was suggesting.
> 
> Fair enough, but we still need to support !PREEMPT kernels.
> 
> >*If* the kernel is fully
> > preemptible, then it makes little sense to put any code in cond_resched,
> > when instead another thread can simply cause a preemption if it needs a
> > quiescent state.  That has the advantage of not imposing any unnecessary
> > polling on code running in the kernel.
> 
> OK.  Exactly which thread are you suggesting should cause the preemption?
> 
> > In a !PREEMPT kernel, it makes a bit more sense to have cond_resched as
> > a voluntary preemption point.  But voluntary preemption points don't
> > make as much sense in a kernel prepared to preempt a thread anywhere.
> 
> That does sound intuitive, but I am not yet prepared to believe that
> the scheduler guys missed this trick.  There might well be some good
> reason for cond_resched() doing something, though I cannot think what it
> might be (something to do with preempt_enable_no_resched(), perhaps?).
> We should at least ask them, although if you want to do some testing
> before asking them, I of course have no objection to your doing so.

Oh, and it turns out to be possible to drive RCU's need-a-qs check much
farther down the cond_resched() rabbit hole than I expected.  Looks like
it can be driven all the way down to rcu_note_context_switch().

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 13/13] mm: memcontrol: rewrite uncharge API

2014-06-20 Thread Sasha Levin
On 06/18/2014 04:40 PM, Johannes Weiner wrote:
> The memcg uncharging code that is involved towards the end of a page's
> lifetime - truncation, reclaim, swapout, migration - is impressively
> complicated and fragile.
> 
> Because anonymous and file pages were always charged before they had
> their page->mapping established, uncharges had to happen when the page
> type could still be known from the context; as in unmap for anonymous,
> page cache removal for file and shmem pages, and swap cache truncation
> for swap pages.  However, these operations happen well before the page
> is actually freed, and so a lot of synchronization is necessary:
> 
> - Charging, uncharging, page migration, and charge migration all need
>   to take a per-page bit spinlock as they could race with uncharging.
> 
> - Swap cache truncation happens during both swap-in and swap-out, and
>   possibly repeatedly before the page is actually freed.  This means
>   that the memcg swapout code is called from many contexts that make
>   no sense and it has to figure out the direction from page state to
>   make sure memory and memory+swap are always correctly charged.
> 
> - On page migration, the old page might be unmapped but then reused,
>   so memcg code has to prevent untimely uncharging in that case.
>   Because this code - which should be a simple charge transfer - is so
>   special-cased, it is not reusable for replace_page_cache().
> 
> But now that charged pages always have a page->mapping, introduce
> mem_cgroup_uncharge(), which is called after the final put_page(),
> when we know for sure that nobody is looking at the page anymore.
> 
> For page migration, introduce mem_cgroup_migrate(), which is called
> after the migration is successful and the new page is fully rmapped.
> Because the old page is no longer uncharged after migration, prevent
> double charges by decoupling the page's memcg association (PCG_USED
> and pc->mem_cgroup) from the page holding an actual charge.  The new
> bits PCG_MEM and PCG_MEMSW represent the respective charges and are
> transferred to the new page during migration.
> 
> mem_cgroup_migrate() is suitable for replace_page_cache() as well,
> which gets rid of mem_cgroup_replace_page_cache().
> 
> Swap accounting is massively simplified: because the page is no longer
> uncharged as early as swap cache deletion, a new mem_cgroup_swapout()
> can transfer the page's memory+swap charge (PCG_MEMSW) to the swap
> entry before the final put_page() in page reclaim.
> 
> Finally, page_cgroup changes are now protected by whatever protection
> the page itself offers: anonymous pages are charged under the page
> table lock, whereas page cache insertions, swapin, and migration hold
> the page lock.  Uncharging happens under full exclusion with no
> outstanding references.  Charging and uncharging also ensure that the
> page is off-LRU, which serializes against charge migration.  Remove
> the very costly page_cgroup lock and set pc->flags non-atomically.
> 
> Signed-off-by: Johannes Weiner 

Hi Johannes,

I'm seeing the following when booting a VM, bisection pointed me to this
patch.

[   32.830823] BUG: using __this_cpu_add() in preemptible [] code: 
mkdir/8677
[   32.831522] caller is __this_cpu_preempt_check+0x13/0x20
[   32.832079] CPU: 35 PID: 8677 Comm: mkdir Not tainted 
3.16.0-rc1-next-20140620-sasha-00023-g8fc12ed #700
[   32.832898]  b27ea69d 8800cb91b618 b151820b 
0002
[   32.833607]  0023 8800cb91b648 aeb4c799 
88006efa5b60
[   32.834318]  ea0007cff9c0 0001 0001 
8800cb91b658
[   32.835030] Call Trace:
[   32.835257] dump_stack (lib/dump_stack.c:52)
[   32.835755] check_preemption_disabled (./arch/x86/include/asm/preempt.h:80 
lib/smp_processor_id.c:49)
[   32.836336] __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[   32.836991] mem_cgroup_charge_statistics.isra.23 (mm/memcontrol.c:930)
[   32.837682] commit_charge (mm/memcontrol.c:2761)
[   32.838187] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 
include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199)
[   32.838735] ? get_parent_ip (kernel/sched/core.c:2546)
[   32.839230] mem_cgroup_commit_charge (mm/memcontrol.c:6519)
[   32.839807] __add_to_page_cache_locked (mm/filemap.c:588 
include/linux/jump_label.h:115 include/trace/events/filemap.h:50 
mm/filemap.c:589)
[   32.840479] add_to_page_cache_lru (mm/filemap.c:627)
[   32.841048] read_cache_pages (mm/readahead.c:92)
[   32.841560] ? v9fs_cache_session_get_key (fs/9p/cache.c:306)
[   32.842145] ? v9fs_write_begin (fs/9p/vfs_addr.c:99)
[   32.842694] v9fs_vfs_readpages (fs/9p/vfs_addr.c:127)
[   32.843251] __do_page_cache_readahead (mm/readahead.c:123 mm/readahead.c:200)
[   32.843848] ? __do_

[PATCH v2] selinux: no recursive read_lock of policy_rwlock in security_genfs_sid()

2014-06-20 Thread Waiman Long
v1->v2:
 - Add an internal helper to switch on/off lock acquisition instead
   of modifying the external API.

With introduction of fair queued rwlock, recursive read_lock() may hang
the offending process if there is a write_lock() somewhere in between.

With recursive read_lock checking enabled, the following error was
reported:

=
[ INFO: possible recursive locking detected ]
3.16.0-rc1 #2 Tainted: GE
-
load_policy/708 is trying to acquire lock:
 (policy_rwlock){.+.+..}, at: []
security_genfs_sid+0x3a/0x170

but task is already holding lock:
 (policy_rwlock){.+.+..}, at: []
security_fs_use+0x2c/0x110

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(policy_rwlock);
  lock(policy_rwlock);

This patch fixes the occurrence of recursive read_lock() of
policy_rwlock in security_genfs_sid() by adding a helper function
which has a 5th argument to indicate if the rwlock has been taken.

Signed-off-by: Waiman Long 
---
 security/selinux/ss/services.c |   36 
 1 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 4bca494..5f4c1f3 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2277,20 +2277,22 @@ out:
 }
 
 /**
- * security_genfs_sid - Obtain a SID for a file in a filesystem
+ * __security_genfs_sid - Helper to obtain a SID for a file in a filesystem
  * @fstype: filesystem type
  * @path: path from root of mount
  * @sclass: file security class
  * @sid: SID for path
+ * @locked: true if policy_rwlock taken
  *
  * Obtain a SID to use for a file in a filesystem that
  * cannot support xattr or use a fixed labeling behavior like
  * transition SIDs or task SIDs.
  */
-int security_genfs_sid(const char *fstype,
-  char *path,
-  u16 orig_sclass,
-  u32 *sid)
+static inline int __security_genfs_sid(const char *fstype,
+  char *path,
+  u16 orig_sclass,
+  u32 *sid,
+  int locked)
 {
int len;
u16 sclass;
@@ -2301,7 +2303,8 @@ int security_genfs_sid(const char *fstype,
while (path[0] == '/' && path[1] == '/')
path++;
 
-   read_lock(_rwlock);
+   if (!locked)
+   read_lock(_rwlock);
 
sclass = unmap_class(orig_sclass);
*sid = SECINITSID_UNLABELED;
@@ -2336,11 +2339,27 @@ int security_genfs_sid(const char *fstype,
*sid = c->sid[0];
rc = 0;
 out:
-   read_unlock(_rwlock);
+   if (!locked)
+   read_unlock(_rwlock);
return rc;
 }
 
 /**
+ * security_genfs_sid - Obtain a SID for a file in a filesystem
+ * @fstype: filesystem type
+ * @path: path from root of mount
+ * @sclass: file security class
+ * @sid: SID for path
+ */
+int security_genfs_sid(const char *fstype,
+  char *path,
+  u16 orig_sclass,
+  u32 *sid)
+{
+   return __security_genfs_sid(fstype, path, orig_sclass, sid, false);
+}
+
+/**
  * security_fs_use - Determine how to handle labeling for a filesystem.
  * @sb: superblock in question
  */
@@ -2370,7 +2389,8 @@ int security_fs_use(struct super_block *sb)
}
sbsec->sid = c->sid[0];
} else {
-   rc = security_genfs_sid(fstype, "/", SECCLASS_DIR, >sid);
+   rc = __security_genfs_sid(fstype, "/", SECCLASS_DIR,
+ >sid, true);
if (rc) {
sbsec->behavior = SECURITY_FS_USE_NONE;
rc = 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

2014-06-20 Thread Eric W. Biederman
Willy Tarreau  writes:

> Hi Eric,
>
> On Fri, Jun 20, 2014 at 03:16:07PM -0700, Eric W. Biederman wrote:
>> Willy Tarreau  writes:
>> 
>> > Hi Luis,
>> >
>> > On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
>> >> I was finally able to spend some more time with this and tried (a
>> >> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
>> >> Although I haven't done any extended testing, I don't see the two
>> >> stack traces and the /proc/sys/net/ipv4/ directory seems to be
>> >> correctly populated.
>> >> 
>> >> I'm attaching the patch I've used, based on Tyler's.
>> >
>> > Would any of you or Tyler please kindly pass me a signed-off-by with
>> > a commit message ? That would be great. Alternately I'd do it myself
>> > and mention you authored them.
>> 
>> If my memory serves it is possibe in 2.6.32 to set 
>> .ctl_name = CTL_UNNEEDED
>> 
>> and not need to implement a .strategy routine at all.
>
> Ah that's quite interesting, thanks for the tip!
>
>> Given the fact that most people got the strategy routines
>> slightly wrong and that sys_sysctl is effectively unused
>> a strategy where you don't implement code that no-one
>> will use in a backport I would be preferable.
>
> OK.
>
>> Since you have mentioned this has come up a couple of times if something
>> else this will be something to think about for next time.
>
> I'm keeping your e-mail where I manage patches, hoping to recognize
> this case next time.
>
>> I am puzzled why .ctl_name was populated in a backport at all.
>
> Oh it's simply because I didn't know it did not have to be there,
> and among the few reviewers, I guess that it's not common to know
> what version uses what semantics.

I guess what I meant is that the field .ctl_name does not even exist
anymore for the same reasons .strategy does not exist anymore.  So I
was just suprirsed that someone picked a randomish number and stuck
it in there.

If anyone actually were to use those randomish numbers in the binary
sys_sysctl call their applications would break when they eventually
moved to a more recent kernel.

Which is one of the motivations it was decided there would be no more
binary sysctls allocated around the 2.6.32 timeframe.

> Thank you for the exaplanation, it's really helpful. We're not used
> to backport sysctl changes but here I got caught a few times and have
> found some sysctl.conf with bogus values in field a few times, so it
> was really important to backport this one.

Sysctl do have their uses, and at least 2.6.32 has runtime sysctl checks
to keep the insanity to a dull roar.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 0/5] Fix for cond_resched performance regression

2014-06-20 Thread Paul E. McKenney
On Fri, Jun 20, 2014 at 04:52:15PM -0700, j...@joshtriplett.org wrote:
> On Fri, Jun 20, 2014 at 04:30:33PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 20, 2014 at 03:39:51PM -0700, j...@joshtriplett.org wrote:
> > > On Fri, Jun 20, 2014 at 03:11:20PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 20, 2014 at 02:24:23PM -0700, j...@joshtriplett.org wrote:
> > > > > On Fri, Jun 20, 2014 at 12:12:36PM -0700, Paul E. McKenney wrote:
> > > > > > o   Make cond_resched() a no-op for PREEMPT=y.  This might well turn
> > > > > > out to be a good thing, but it doesn't help give RCU the 
> > > > > > quiescent
> > > > > > states that it needs.
> > > > > 
> > > > > What about doing this, together with letting the fqs logic poke
> > > > > un-quiesced kernel code as needed?  That way, rather than having
> > > > > cond_resched do any work, you have the fqs logic recognize that a
> > > > > particular CPU has gone too long without quiescing, without disturbing
> > > > > that CPU at all if it hasn't gone too long.
> > > > 
> > > > My next stop is to post the previous series, but with a couple of
> > > > exports and one bug fix uncovered by testing thus far, but after
> > > > another round of testing.  Then I am going to take a close look at
> > > > this one:
> > > > 
> > > > o   Push the checks further into cond_resched(), so that the
> > > > fastpath does the same sequence of instructions that the 
> > > > original
> > > > did.  This might work well, but requires IPIs, which are not so
> > > > good for latencies on the remote CPU.  It nevertheless might be 
> > > > a
> > > > decent long-term solution given that if your CPU is spending 
> > > > many
> > > > jiffies looping in the kernel, you aren't getting good latencies
> > > > anyway.  It also has the benefit of allowing RCU to take 
> > > > advantage
> > > > of the implicit quiescent states of all cond_resched() calls,
> > > > and of eliminating the need for a separate cond_resched_rcu_qs()
> > > > and for RCU_COND_RESCHED_QS.
> > > > 
> > > > The one you call out is of course interesting as well.  But there are
> > > > a couple of questions:
> > > > 
> > > > 1.  Why wasn't cond_resched() a no-op in CONFIG_PREEMPT to start
> > > > with?  It just seems to obvious a thing to do for it to possibly
> > > > be an oversight.  (What, me paranoid?)
> > > > 
> > > > 2.  When RCU recognizes that a particular CPU has gone too long,
> > > > exactly what are you suggesting that RCU do about it?  When
> > > > formulating your answer, please give due consideration to the
> > > > implications of that CPU being a NO_HZ_FULL CPU.  ;-)
> > > 
> > > Send it an IPI that either causes it to flag a quiescent state
> > > immediately if currently quiesced or causes it to quiesce at the next
> > > opportunity if not.
> > 
> > OK.  But if we are in a !PREEMPT kernel,
> 
> That's not the case I was suggesting.

Fair enough, but we still need to support !PREEMPT kernels.

>*If* the kernel is fully
> preemptible, then it makes little sense to put any code in cond_resched,
> when instead another thread can simply cause a preemption if it needs a
> quiescent state.  That has the advantage of not imposing any unnecessary
> polling on code running in the kernel.

OK.  Exactly which thread are you suggesting should cause the preemption?

> In a !PREEMPT kernel, it makes a bit more sense to have cond_resched as
> a voluntary preemption point.  But voluntary preemption points don't
> make as much sense in a kernel prepared to preempt a thread anywhere.

That does sound intuitive, but I am not yet prepared to believe that
the scheduler guys missed this trick.  There might well be some good
reason for cond_resched() doing something, though I cannot think what it
might be (something to do with preempt_enable_no_resched(), perhaps?).
We should at least ask them, although if you want to do some testing
before asking them, I of course have no objection to your doing so.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] arm64,ia64,ppc,s390,sh,tile,um,x86,mm: Remove default gate area

2014-06-20 Thread Andy Lutomirski
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).

This default is only useful for ia64.  arm64, ppc, s390, sh, tile,
64-bit UML, and x86_32 have their own code just to disable it.  arm,
32-bit UML, and x86_64 have gate areas, but they have their own
implementations.

This gets rid of the default and moves the code into ia64.

This should save some code on architectures without a gate area: it's
now possible to inline the gate_area functions in the default case.

Signed-off-by: Andy Lutomirski 
---
 arch/arm64/include/asm/page.h  |  3 ---
 arch/arm64/kernel/vdso.c   | 19 ---
 arch/ia64/include/asm/page.h   |  2 ++
 arch/ia64/mm/init.c| 26 ++
 arch/powerpc/include/asm/page.h|  3 ---
 arch/powerpc/kernel/vdso.c | 16 
 arch/s390/include/asm/page.h   |  2 --
 arch/s390/kernel/vdso.c| 15 ---
 arch/sh/include/asm/page.h |  5 -
 arch/sh/kernel/vsyscall/vsyscall.c | 15 ---
 arch/tile/include/asm/page.h   |  6 --
 arch/tile/kernel/vdso.c| 15 ---
 arch/um/include/asm/page.h |  5 +
 arch/x86/include/asm/page.h|  1 -
 arch/x86/include/asm/page_64.h |  2 ++
 arch/x86/um/asm/elf.h  |  1 -
 arch/x86/um/mem_64.c   | 15 ---
 arch/x86/vdso/vdso32-setup.c   | 19 +--
 include/linux/mm.h | 17 -
 mm/memory.c| 38 --
 mm/nommu.c |  5 -
 21 files changed, 48 insertions(+), 182 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 46bf666..992710f 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -28,9 +28,6 @@
 #define PAGE_SIZE  (_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK  (~(PAGE_SIZE-1))
 
-/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */
-#define __HAVE_ARCH_GATE_AREA  1
-
 #ifndef __ASSEMBLY__
 
 #ifdef CONFIG_ARM64_64K_PAGES
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 50384fe..f630626 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -187,25 +187,6 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 }
 
 /*
- * We define AT_SYSINFO_EHDR, so we need these function stubs to keep
- * Linux happy.
- */
-int in_gate_area_no_mm(unsigned long addr)
-{
-   return 0;
-}
-
-int in_gate_area(struct mm_struct *mm, unsigned long addr)
-{
-   return 0;
-}
-
-struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
-{
-   return NULL;
-}
-
-/*
  * Update the vDSO data page to keep in sync with kernel timekeeping.
  */
 void update_vsyscall(struct timekeeper *tk)
diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h
index f1e1b2e..1f1bf14 100644
--- a/arch/ia64/include/asm/page.h
+++ b/arch/ia64/include/asm/page.h
@@ -231,4 +231,6 @@ get_order (unsigned long size)
 #define PERCPU_ADDR(-PERCPU_PAGE_SIZE)
 #define LOAD_OFFSET(KERNEL_START - KERNEL_TR_PAGE_SIZE)
 
+#define __HAVE_ARCH_GATE_AREA  1
+
 #endif /* _ASM_IA64_PAGE_H */
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 25c3502..35efaa3 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -278,6 +278,32 @@ setup_gate (void)
ia64_patch_gate();
 }
 
+static struct vm_area_struct gate_vma;
+
+static int __init gate_vma_init(void)
+{
+   gate_vma.vm_mm = NULL;
+   gate_vma.vm_start = FIXADDR_USER_START;
+   gate_vma.vm_end = FIXADDR_USER_END;
+   gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC;
+   gate_vma.vm_page_prot = __P101;
+
+   return 0;
+}
+__initcall(gate_vma_init);
+
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
+{
+   return _vma;
+}
+
+int in_gate_area_no_mm(unsigned long addr)
+{
+   if ((addr >= FIXADDR_USER_START) && (addr < FIXADDR_USER_END))
+   return 1;
+   return 0;
+}
+
 void ia64_mmu_init(void *my_cpu_data)
 {
unsigned long pta, impl_va_bits;
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 32e4e21..26fe1ae 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -48,9 +48,6 @@ extern unsigned int HPAGE_SHIFT;
 #define HUGE_MAX_HSTATE(MMU_PAGE_COUNT-1)
 #endif
 
-/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */
-#define __HAVE_ARCH_GATE_AREA  1
-
 /*
  * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if we
  * assign PAGE_MASK to a larger type it gets extended the way we want
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index ce74c33..f174351 100644
--- a/arch/powerpc/kernel/vdso.c
+++ 

Re: [PATCH 2/2] drivers/net/usb/asix_devices.c: inline ax88772_unbind

2014-06-20 Thread Sergei Shtylyov

Hello.

On 06/21/2014 12:40 AM, Fabian Frederick wrote:


inline this one line function used in driver_info structure



Cc: "David S. Miller" 
Cc: Emil Goode 
Cc: linux-...@vger.kernel.org
Signed-off-by: Fabian Frederick 
---
  drivers/net/usb/asix_devices.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 8a7582b..a41926a 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -497,7 +497,7 @@ static int ax88772_bind(struct usbnet *dev, struct 
usb_interface *intf)
return 0;
  }

-static void ax88772_unbind(struct usbnet *dev, struct usb_interface *intf)
+static inline void ax88772_unbind(struct usbnet *dev, struct usb_interface 
*intf)
  {
kfree(dev->driver_priv);
  }


   gcc is perfectly capable of figuring that out. No need to use *inline* 
outside the *.h files.


WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2c: exynos5: Properly use the "noirq" variants of suspend/resume

2014-06-20 Thread Tomasz Figa
On 21.06.2014 01:53, Doug Anderson wrote:
> Kevin,
> 
> On Fri, Jun 20, 2014 at 4:13 PM, Kevin Hilman  wrote:
>> Doug Anderson  writes:
>>
>>> Kevin,
>>>
>>> On Fri, Jun 20, 2014 at 2:48 PM, Kevin Hilman  wrote:
 Hi Doug,

 Doug Anderson  writes:

> On Thu, Jun 19, 2014 at 11:43 AM, Kevin Hilman  wrote:
>> Doug Anderson  writes:
>>
>>> The original code for the exynos i2c controller registered for the
>>> "noirq" variants.  However during review feedback it was moved to
>>> SIMPLE_DEV_PM_OPS without anyone noticing that it meant we were no
>>> longer actually "noirq" (despite functions named
>>> exynos5_i2c_suspend_noirq and exynos5_i2c_resume_noirq).
>>>
>>> i2c controllers that might have wakeup sources on them seem to need to
>>> resume at noirq time so that the individual drivers can actually read
>>> the i2c bus to handle their wakeup.
>>
>> I suspect usage of the noirq variants pre-dates the existence of the
>> late/early callbacks in the PM core, but based on the description above,
>> I suspect what you actually want is the late/early callbacks.
>
> I think it actually really needs noirq.  ;)

 Yes, it appears it does.   Objection withdrawn.

 I just wanted to be sure because since the introduction of late/early,
 the need for noirq should be pretty rare, but there certainly are needs.

 
 In this case though, the need for it has more to do with the
 lack of a way for us to describe non parent-child device dependencies
 than whether or not IRQs are enabled or not.
 
>>>
>>> Actually, I'm not sure that's true, but I'll talk through it and you
>>> can point to where I'm wrong (I often am!)
>>>
>>> If you're a wakeup device then you need to be ready to handle
>>> interrupts as soon as the "noirq" phase of resume is done, right?
>>
>> As soon as the noirq phase of your own driver is done, correct.
>>
>>> Said another way: you need to be ready to handle interrupts _before_
>>> the normal resume code is called and be ready to handle interrupts
>>> even _before_ the early resume code is called.
>>
>> Correct.
>>
>>> That means if you are implementing a bus that's needed by any devices
>>> with wakeup interrupts then it's your responsibility to also be
>>> prepared to run this early.
>>>
>>> In this particular case the max77686 driver doesn't need to do
>>> anything at all to be ready to handle interrupts.  It's suspend and
>>> resume code is just boilerplate "enable wakeups / disable wakeups" and
>>> it has no "noirq" code.  The max77686 driver doesn't have any "noirq"
>>> wake call because it would just be empty.
>>>
>>> Said another way: the problem isn't that the max77686 wakeup gets
>>> called before the i2c wakeup.  The problem is that i2c is needed ASAP
>>> once IRQs are enabled and thus needs to be run noirq.
>>>
>>> Does that sound semi-correct?
>>
>> Yes that's correct.
>>
>> My point above was (trying to be) that ultimately this is an ordering
>> issue.  e.g. the bus device needs to be "ready" before wakeup devices on
>> that bus can handle wakeup interrupts etc.  The way we're handling that
>> ordering is by the implied ordering of noirq, late/early and "normal"
>> callbacks.  That's convenient, but not exactly obvious.
>>
>> It works because we dont' typically need too many layers here, but it
>> would be much more understandable if we could describe this kind of
>> dependency in a way that the suspend/resume code would suspend/resume
>> things in the right order rather than by tinkering with callback levels
>> (since otherwise suspend/resume ordering just depends on probe order.)
>>
>> This issue then usually gets me headed down my usual rant path about how
>> I think runtime PM is much better suited for handling ordering and
>> dependencies becuase it automatically handles parent/child dependencies
>> and non parent/child dependencies can be handled by taking advantage of
>> the get/put APIs which are refcounted, ect etc. but that's another can
>> worms.
> 
> Ah, I gotcha.  Yes, I'm a fan of having explicit dependency orderings too.
> 
> So I guess in this case the truly correct way to handle it is:
> 
> 1. i2c controller should have Runtime PM even though (as per the code
> now) there's nothing you can do to it to save power under normal
> circumstances.  So the runtime "suspend" code would be a no-op.
> 
> 2. When the i2c controller is told to runtime resume, it should
> double-check if a full SoC poweroff has happened since the last time
> it checked.  In this case it should reinit its hardware.
> 
> 3. If the i2c controller gets a full "resume" callback then it should
> also reinit the hardware just so it's not sitting in a half-configured
> state until the first peripheral uses it.
> 
> If later someone finds a way to power gate the i2c controller when no
> active transfers are going (and we actually save non-trivial power
> doing this) then we've got a 

Re: [PATCH] Check for Null return of function of affs_bread in function affs_truncate

2014-06-20 Thread Thomas Gleixner
On Fri, 20 Jun 2014, Nick Krause wrote:

> Ok that's fine I would return as if it's a NULL the other parts of the
> function can't continue.
> Nick
> 
> On Thu, Jun 19, 2014 at 1:21 AM, Dan Carpenter  
> wrote:
> > On Wed, Jun 18, 2014 at 06:08:05PM -0400, Nicholas Krause wrote:
> >> Signed-off-by: Nicholas Krause 
> >> ---
> >>  fs/affs/file.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/fs/affs/file.c b/fs/affs/file.c
> >> index a7fe57d..f26482d 100644
> >> --- a/fs/affs/file.c
> >> +++ b/fs/affs/file.c
> >> @@ -923,6 +923,8 @@ affs_truncate(struct inode *inode)
> >>
> >>   while (ext_key) {
> >>   ext_bh = affs_bread(sb, ext_key);
> >> + if (!ext_bh)
> >> + return;
> >
> > The problem is that we don't know if we should return here or break
> > here.  If you don't understand the code, then it's best to just leave it
> > alone.

Dan, what kind of attitude is that?

Nick certainly found an issue where a possible NULL return from
affs_bread() can cause havoc.

Do YOU understand that code?

If yes, you better explain, WHY Nicks finding is a false positive
instead of just telling him off in a very inpolite way.

If not, you better refrain from telling a reporter that he does not
understand the code and should stay away.

You clearly stated that you do not understand it either:

> > The problem is that we don't know if we should return here or break
> > here.

The problem here is that proceeding with a known NULL pointer is wrong
to begin with. It does not matter at all whether break or return is
the proper thing to do. What matters is that proceeding with a NULL
pointer is wrong to begin with, no matter what.

So either explain why this is a non issue and the NULL pointer return
cannot happen or shut up and try to find a proper solution for that
"return" vs. "break" issue.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mvebu: Fix missing binding documentation for Armada 38x

2014-06-20 Thread Jason Cooper
On Fri, Jun 20, 2014 at 05:33:06PM -0500, Rob Herring wrote:
> On Fri, Jun 20, 2014 at 1:52 PM, Jason Cooper  wrote:
> > On Thu, Jun 19, 2014 at 06:40:43PM +0200, Gregory CLEMENT wrote:
> >> For the Armada 380 and Armada 385 SoCs, the common bindings for those
> >> 2 SoCs, was forgotten. This patch add the documentation for the
> >> marvell,aramda38x property.
> >>
> >> Signed-off-by: Gregory CLEMENT 
> >> --
> >> Hi,
> >>
> >> This fix should be merged in 3.16. For 3.15 I am not sure as it is not
> >> a regression.
> >>
> >> Changelog:
> >> v1->v2
> >>
> >> - Reformulate to make clear that we will need marvell,armada38x _and_ a
> >> SoC specific string. For consistency I duplicated what we have done in
> >> armada-370-xp.txt
> >>
> >>
> >> Thanks,
> >> Gregory
> >>
> >>
> >>  Documentation/devicetree/bindings/arm/armada-38x.txt | 17 
> >> +++--
> >>  1 file changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/arm/armada-38x.txt 
> >> b/Documentation/devicetree/bindings/arm/armada-38x.txt
> >> index 11f2330a6554..fa08760046df 100644
> >> --- a/Documentation/devicetree/bindings/arm/armada-38x.txt
> >> +++ b/Documentation/devicetree/bindings/arm/armada-38x.txt
> >> @@ -6,5 +6,18 @@ following property:
> >>
> >>  Required root node property:
> >>
> >> - - compatible: must contain either "marvell,armada380" or
> >> -   "marvell,armada385" depending on the variant of the SoC being used.
> >> +compatible: must contain "marvell,armada38x"
> >
> > I agree with Sergei on this one.  We generally avoid wildcards in
> > compatible strings.  Is there a use case where specifying one of the
> > below wouldn't be sufficient?
> 
> Isn't this a case of just documenting what is already in use?

Technically, yes.  However, there are no products shipping with this SoC
yet.  So there aren't any _real_ users other than the developers
bringing in mainline support.

> I agree wildcards alone are not good, but along with a specific
> compatible is okay. But also there should be some need to have the
> common property.

I'm curious what you would consider to be a sufficient need?  This can
be easily handled by a match table, but a match table could also be
considered rather heavy for this task.

I think any implementation-based justification is prone to opening a can
of worms.  And I'm struggling to see a DT-only justification...

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2c: exynos5: Properly use the "noirq" variants of suspend/resume

2014-06-20 Thread Doug Anderson
Kevin,

On Fri, Jun 20, 2014 at 4:13 PM, Kevin Hilman  wrote:
> Doug Anderson  writes:
>
>> Kevin,
>>
>> On Fri, Jun 20, 2014 at 2:48 PM, Kevin Hilman  wrote:
>>> Hi Doug,
>>>
>>> Doug Anderson  writes:
>>>
 On Thu, Jun 19, 2014 at 11:43 AM, Kevin Hilman  wrote:
> Doug Anderson  writes:
>
>> The original code for the exynos i2c controller registered for the
>> "noirq" variants.  However during review feedback it was moved to
>> SIMPLE_DEV_PM_OPS without anyone noticing that it meant we were no
>> longer actually "noirq" (despite functions named
>> exynos5_i2c_suspend_noirq and exynos5_i2c_resume_noirq).
>>
>> i2c controllers that might have wakeup sources on them seem to need to
>> resume at noirq time so that the individual drivers can actually read
>> the i2c bus to handle their wakeup.
>
> I suspect usage of the noirq variants pre-dates the existence of the
> late/early callbacks in the PM core, but based on the description above,
> I suspect what you actually want is the late/early callbacks.

 I think it actually really needs noirq.  ;)
>>>
>>> Yes, it appears it does.   Objection withdrawn.
>>>
>>> I just wanted to be sure because since the introduction of late/early,
>>> the need for noirq should be pretty rare, but there certainly are needs.
>>>
>>> 
>>> In this case though, the need for it has more to do with the
>>> lack of a way for us to describe non parent-child device dependencies
>>> than whether or not IRQs are enabled or not.
>>> 
>>
>> Actually, I'm not sure that's true, but I'll talk through it and you
>> can point to where I'm wrong (I often am!)
>>
>> If you're a wakeup device then you need to be ready to handle
>> interrupts as soon as the "noirq" phase of resume is done, right?
>
> As soon as the noirq phase of your own driver is done, correct.
>
>> Said another way: you need to be ready to handle interrupts _before_
>> the normal resume code is called and be ready to handle interrupts
>> even _before_ the early resume code is called.
>
> Correct.
>
>> That means if you are implementing a bus that's needed by any devices
>> with wakeup interrupts then it's your responsibility to also be
>> prepared to run this early.
>>
>> In this particular case the max77686 driver doesn't need to do
>> anything at all to be ready to handle interrupts.  It's suspend and
>> resume code is just boilerplate "enable wakeups / disable wakeups" and
>> it has no "noirq" code.  The max77686 driver doesn't have any "noirq"
>> wake call because it would just be empty.
>>
>> Said another way: the problem isn't that the max77686 wakeup gets
>> called before the i2c wakeup.  The problem is that i2c is needed ASAP
>> once IRQs are enabled and thus needs to be run noirq.
>>
>> Does that sound semi-correct?
>
> Yes that's correct.
>
> My point above was (trying to be) that ultimately this is an ordering
> issue.  e.g. the bus device needs to be "ready" before wakeup devices on
> that bus can handle wakeup interrupts etc.  The way we're handling that
> ordering is by the implied ordering of noirq, late/early and "normal"
> callbacks.  That's convenient, but not exactly obvious.
>
> It works because we dont' typically need too many layers here, but it
> would be much more understandable if we could describe this kind of
> dependency in a way that the suspend/resume code would suspend/resume
> things in the right order rather than by tinkering with callback levels
> (since otherwise suspend/resume ordering just depends on probe order.)
>
> This issue then usually gets me headed down my usual rant path about how
> I think runtime PM is much better suited for handling ordering and
> dependencies becuase it automatically handles parent/child dependencies
> and non parent/child dependencies can be handled by taking advantage of
> the get/put APIs which are refcounted, ect etc. but that's another can
> worms.

Ah, I gotcha.  Yes, I'm a fan of having explicit dependency orderings too.

So I guess in this case the truly correct way to handle it is:

1. i2c controller should have Runtime PM even though (as per the code
now) there's nothing you can do to it to save power under normal
circumstances.  So the runtime "suspend" code would be a no-op.

2. When the i2c controller is told to runtime resume, it should
double-check if a full SoC poweroff has happened since the last time
it checked.  In this case it should reinit its hardware.

3. If the i2c controller gets a full "resume" callback then it should
also reinit the hardware just so it's not sitting in a half-configured
state until the first peripheral uses it.

If later someone finds a way to power gate the i2c controller when no
active transfers are going (and we actually save non-trivial power
doing this) then we've got a nice place to put that code.

NOTE: Unless we can actually save power by power gating the i2c
peripheral when there are no active transfers, we would also just have
the i2c_xfer() 

Re: [PATCH tip/core/rcu 0/5] Fix for cond_resched performance regression

2014-06-20 Thread josh
On Fri, Jun 20, 2014 at 04:30:33PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 20, 2014 at 03:39:51PM -0700, j...@joshtriplett.org wrote:
> > On Fri, Jun 20, 2014 at 03:11:20PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 20, 2014 at 02:24:23PM -0700, j...@joshtriplett.org wrote:
> > > > On Fri, Jun 20, 2014 at 12:12:36PM -0700, Paul E. McKenney wrote:
> > > > > o Make cond_resched() a no-op for PREEMPT=y.  This might well turn
> > > > >   out to be a good thing, but it doesn't help give RCU the 
> > > > > quiescent
> > > > >   states that it needs.
> > > > 
> > > > What about doing this, together with letting the fqs logic poke
> > > > un-quiesced kernel code as needed?  That way, rather than having
> > > > cond_resched do any work, you have the fqs logic recognize that a
> > > > particular CPU has gone too long without quiescing, without disturbing
> > > > that CPU at all if it hasn't gone too long.
> > > 
> > > My next stop is to post the previous series, but with a couple of
> > > exports and one bug fix uncovered by testing thus far, but after
> > > another round of testing.  Then I am going to take a close look at
> > > this one:
> > > 
> > > o Push the checks further into cond_resched(), so that the
> > >   fastpath does the same sequence of instructions that the original
> > >   did.  This might work well, but requires IPIs, which are not so
> > >   good for latencies on the remote CPU.  It nevertheless might be a
> > >   decent long-term solution given that if your CPU is spending many
> > >   jiffies looping in the kernel, you aren't getting good latencies
> > >   anyway.  It also has the benefit of allowing RCU to take advantage
> > >   of the implicit quiescent states of all cond_resched() calls,
> > >   and of eliminating the need for a separate cond_resched_rcu_qs()
> > >   and for RCU_COND_RESCHED_QS.
> > > 
> > > The one you call out is of course interesting as well.  But there are
> > > a couple of questions:
> > > 
> > > 1.Why wasn't cond_resched() a no-op in CONFIG_PREEMPT to start
> > >   with?  It just seems to obvious a thing to do for it to possibly
> > >   be an oversight.  (What, me paranoid?)
> > > 
> > > 2.When RCU recognizes that a particular CPU has gone too long,
> > >   exactly what are you suggesting that RCU do about it?  When
> > >   formulating your answer, please give due consideration to the
> > >   implications of that CPU being a NO_HZ_FULL CPU.  ;-)
> > 
> > Send it an IPI that either causes it to flag a quiescent state
> > immediately if currently quiesced or causes it to quiesce at the next
> > opportunity if not.
> 
> OK.  But if we are in a !PREEMPT kernel,

That's not the case I was suggesting.  *If* the kernel is fully
preemptible, then it makes little sense to put any code in cond_resched,
when instead another thread can simply cause a preemption if it needs a
quiescent state.  That has the advantage of not imposing any unnecessary
polling on code running in the kernel.

In a !PREEMPT kernel, it makes a bit more sense to have cond_resched as
a voluntary preemption point.  But voluntary preemption points don't
make as much sense in a kernel prepared to preempt a thread anywhere.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] lib: list_sort_test(): Return -ENOMEM when allocation fails

2014-06-20 Thread Rasmus Villemoes
Signed-off-by: Rasmus Villemoes 
---
 lib/list_sort.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/list_sort.c b/lib/list_sort.c
index 1183fa7..291412a 100644
--- a/lib/list_sort.c
+++ b/lib/list_sort.c
@@ -207,7 +207,7 @@ static int __init cmp(void *priv, struct list_head *a, 
struct list_head *b)
 
 static int __init list_sort_test(void)
 {
-   int i, count = 1, err = -EINVAL;
+   int i, count = 1, err = -ENOMEM;
struct debug_el *el;
struct list_head *cur, *tmp;
LIST_HEAD(head);
@@ -239,6 +239,7 @@ static int __init list_sort_test(void)
 
list_sort(NULL, , cmp);
 
+   err = -EINVAL;
for (cur = head.next; cur->next !=  cur = cur->next) {
struct debug_el *el1;
int cmp_result;
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] lib: list_sort_test(): Add extra corruption check

2014-06-20 Thread Rasmus Villemoes
Add a check to make sure that the prev pointer of the list head points
to the last element on the list.

Signed-off-by: Rasmus Villemoes 
---
 lib/list_sort.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/list_sort.c b/lib/list_sort.c
index 291412a..832f525 100644
--- a/lib/list_sort.c
+++ b/lib/list_sort.c
@@ -272,6 +272,11 @@ static int __init list_sort_test(void)
}
count++;
}
+   if (head->prev != cur) {
+   printk(KERN_ERR "list_sort_test: error: list is corrupted\n");
+   goto exit;
+   }
+
 
if (count != TEST_LIST_LEN) {
printk(KERN_ERR "list_sort_test: error: bad list length %d",
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/9] tools, perf: Make get_srcline fall back to sym+offset

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

When the source line is not found fall back to sym + offset.
This is generally much more useful than a raw address.
For this we need to pass in the symbol from the caller.
For some callers it's awkward to compute, so we stay
at the old behaviour.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/annotate.c  |  2 +-
 tools/perf/util/callchain.c |  3 ++-
 tools/perf/util/map.c   |  2 +-
 tools/perf/util/sort.c  |  6 --
 tools/perf/util/srcline.c   | 12 +---
 tools/perf/util/util.h  |  4 +++-
 6 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 12997ff..363b0c1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1187,7 +1187,7 @@ static int symbol__get_source_line(struct symbol *sym, 
struct map *map,
goto next;
 
offset = start + i;
-   src_line->path = get_srcline(map->dso, offset);
+   src_line->path = get_srcline(map->dso, offset, NULL, false);
insert_source_line(_root, src_line);
 
next:
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 2ca3655..ad4d7cb 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -690,7 +690,8 @@ char *callchain_list__sym_name(struct callchain_list *cl,
cl->ms.map && !cl->srcline)
cl->srcline = get_srcline(cl->ms.map->dso,
  map__rip_2objdump(cl->ms.map,
-   cl->ip));
+   cl->ip),
+ cl->ms.sym, false);
if (cl->srcline)
printed = scnprintf(bf, bfsize, "%s %s",
cl->ms.sym->name, cl->srcline);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 8ccbb32..57cdc33 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -355,7 +355,7 @@ int map__fprintf_srcline(struct map *map, u64 addr, const 
char *prefix,
 
if (map && map->dso) {
srcline = get_srcline(map->dso,
- map__rip_2objdump(map, addr));
+ map__rip_2objdump(map, addr), NULL, true);
if (srcline != SRCLINE_UNKNOWN)
ret = fprintf(fp, "%s%s", prefix, srcline);
free_srcline(srcline);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 901f44b..fee07ca 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -285,7 +285,8 @@ sort__srcline_cmp(struct hist_entry *left, struct 
hist_entry *right)
else {
struct map *map = left->ms.map;
left->srcline = get_srcline(map->dso,
-   map__rip_2objdump(map, left->ip));
+  map__rip_2objdump(map, left->ip),
+   left->ms.sym, true);
}
}
if (!right->srcline) {
@@ -294,7 +295,8 @@ sort__srcline_cmp(struct hist_entry *left, struct 
hist_entry *right)
else {
struct map *map = right->ms.map;
right->srcline = get_srcline(map->dso,
-   map__rip_2objdump(map, right->ip));
+map__rip_2objdump(map, right->ip),
+right->ms.sym, true);
}
}
return strcmp(right->srcline, left->srcline);
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ac877f9..36a7aff 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -8,12 +8,13 @@
 #include "util/util.h"
 #include "util/debug.h"
 
+#include "symbol.h"
+
 #ifdef HAVE_LIBBFD_SUPPORT
 
 /*
  * Implement addr2line using libbfd.
  */
-#define PACKAGE "perf"
 #include 
 
 struct a2l_data {
@@ -250,7 +251,8 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
  */
 #define A2L_FAIL_LIMIT 123
 
-char *get_srcline(struct dso *dso, unsigned long addr)
+char *get_srcline(struct dso *dso, unsigned long addr, struct symbol *sym,
+ bool show_sym)
 {
char *file = NULL;
unsigned line = 0;
@@ -289,7 +291,11 @@ out:
dso->has_srcline = 0;
dso__free_a2l(dso);
}
-   if (asprintf(, "%s[%lx]", dso->short_name, addr) < 0)
+   if (sym) {
+   if (asprintf(, "%s+%ld", show_sym ? sym->name : "",
+   addr - sym->start) < 0)
+   return SRCLINE_UNKNOWN;
+   } else if (asprintf(, "%s[%lx]", dso->short_name, addr) < 0)
return 

[PATCH 9/9] tools, perf: Add asprintf replacement

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

asprintf corrupts memory on some older glibc versions.
Provide a replacement. This fixes various segfaults
with --branch-history on older Fedoras.

Signed-off-by: Andi Kleen 
---
 tools/perf/Makefile.perf   |  1 +
 tools/perf/util/asprintf.c | 28 
 2 files changed, 29 insertions(+)
 create mode 100644 tools/perf/util/asprintf.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index ae20edf..57be4b7 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -372,6 +372,7 @@ LIB_OBJS += $(OUTPUT)util/vdso.o
 LIB_OBJS += $(OUTPUT)util/stat.o
 LIB_OBJS += $(OUTPUT)util/record.o
 LIB_OBJS += $(OUTPUT)util/srcline.o
+LIB_OBJS += $(OUTPUT)util/asprintf.o
 LIB_OBJS += $(OUTPUT)util/data.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
diff --git a/tools/perf/util/asprintf.c b/tools/perf/util/asprintf.c
new file mode 100644
index 000..9aafaca
--- /dev/null
+++ b/tools/perf/util/asprintf.c
@@ -0,0 +1,28 @@
+/* Replacement for asprintf as it's buggy in older glibc versions */
+#include 
+#include 
+#include 
+#include 
+
+int vasprintf(char **str, const char *fmt, va_list ap)
+{
+   char buf[1024];
+   int len = vsnprintf(buf, sizeof buf, fmt, ap);
+
+   *str = malloc(len + 1);
+   if (!*str)
+   return -1;
+   strcpy(*str, buf);
+   return len;
+}
+
+int asprintf(char **str, const char *fmt, ...)
+{
+   va_list ap;
+   int ret;
+
+   va_start(ap, fmt);
+   ret = vasprintf(str, fmt, ap);
+   va_end(ap);
+   return ret;
+}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/9] perf, tools: Enable printing the srcline in the history v4

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

For lbr-as-callgraph we need to see the line number in the history,
because many LBR entries can be in a single function, and just
showing the same function name many times is not useful.

When the history code is configured to sort by address, also try to
resolve the address to a file:srcline and display this in the browser.
If that doesn't work still display the address.

This can be also useful without LBRs for understanding which call in a large
function (or in which inlined function) called something else.

Contains fixes from Namhyung Kim

v2: Refactor code into common function
v3: Fix GTK build
v4: Rebase
Signed-off-by: Andi Kleen 
---
 tools/perf/ui/browsers/hists.c | 17 -
 tools/perf/ui/gtk/hists.c  | 11 +--
 tools/perf/ui/stdio/hist.c | 23 +--
 tools/perf/util/callchain.c| 29 +
 tools/perf/util/callchain.h|  5 +
 tools/perf/util/machine.c  |  2 +-
 tools/perf/util/srcline.c  |  6 --
 7 files changed, 49 insertions(+), 44 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 52c03fb..e0f32eb 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -422,23 +422,6 @@ out:
return key;
 }
 
-static char *callchain_list__sym_name(struct callchain_list *cl,
- char *bf, size_t bfsize, bool show_dso)
-{
-   int printed;
-
-   if (cl->ms.sym)
-   printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-   else
-   printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-
-   if (show_dso)
-   scnprintf(bf + printed, bfsize - printed, " %s",
- cl->ms.map ? cl->ms.map->dso->short_name : "unknown");
-
-   return bf;
-}
-
 #define LEVEL_OFFSET_STEP 3
 
 static int hist_browser__show_callchain_node_rb_tree(struct hist_browser 
*browser,
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 6ca60e4..a21b77e 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -87,15 +87,6 @@ void perf_gtk__init_hpp(void)
perf_gtk__hpp_color_overhead_acc;
 }
 
-static void callchain_list__sym_name(struct callchain_list *cl,
-char *bf, size_t bfsize)
-{
-   if (cl->ms.sym)
-   scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
-   else
-   scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);
-}
-
 static void perf_gtk__add_callchain(struct rb_root *root, GtkTreeStore *store,
GtkTreeIter *parent, int col, u64 total)
 {
@@ -126,7 +117,7 @@ static void perf_gtk__add_callchain(struct rb_root *root, 
GtkTreeStore *store,
scnprintf(buf, sizeof(buf), "%5.2f%%", percent);
gtk_tree_store_set(store, , 0, buf, -1);
 
-   callchain_list__sym_name(chain, buf, sizeof(buf));
+   callchain_list__sym_name(chain, buf, sizeof(buf), 
false);
gtk_tree_store_set(store, , col, buf, -1);
 
if (need_new_parent) {
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 90122ab..570d79d 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -41,6 +41,7 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct 
callchain_list *chain,
 {
int i;
size_t ret = 0;
+   char bf[1024];
 
ret += callchain__fprintf_left_margin(fp, left_margin);
for (i = 0; i < depth; i++) {
@@ -56,11 +57,8 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct 
callchain_list *chain,
} else
ret += fprintf(fp, "%s", "  ");
}
-   if (chain->ms.sym)
-   ret += fprintf(fp, "%s\n", chain->ms.sym->name);
-   else
-   ret += fprintf(fp, "0x%0" PRIx64 "\n", chain->ip);
-
+   fputs(callchain_list__sym_name(chain, bf, sizeof(bf), false), fp);
+   fputc('\n', fp);
return ret;
 }
 
@@ -168,6 +166,7 @@ static size_t callchain__fprintf_graph(FILE *fp, struct 
rb_root *root,
struct rb_node *node;
int i = 0;
int ret = 0;
+   char bf[1024];
 
/*
 * If have one single callchain root, don't bother printing
@@ -196,10 +195,8 @@ static size_t callchain__fprintf_graph(FILE *fp, struct 
rb_root *root,
} else
ret += callchain__fprintf_left_margin(fp, 
left_margin);
 
-   if (chain->ms.sym)
-   ret += fprintf(fp, " %s\n", 
chain->ms.sym->name);
-   else
-   ret += fprintf(fp, " %p\n", (void 
*)(long)chain->ip);
+   ret += fprintf(fp, "%s\n", 
callchain_list__sym_name(chain, bf, sizeof(bf),
+  

[PATCH 2/9] perf, tools: Add --branch-history option to report v3

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

Add a --branch-history option to perf report that changes all
the settings necessary for using the branches in callstacks.

This is just a short cut to make this nicer to use, it does
not enable any functionality by itself.

v2: Change sort order. Rename option to --branch-history to
be less confusing.
v3: Updates
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |  5 +
 tools/perf/builtin-report.c  | 34 +++-
 tools/perf/util/machine.c| 12 +--
 3 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 29a21b0..45f73c9 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -255,6 +255,11 @@ OPTIONS
branch stacks and it will automatically switch to the branch view mode,
unless --no-branch-stack is used.
 
+--branch-history::
+   Add the addresses of sampled taken branches to the callstack.
+   This allows to examine the path the program took to each sample.
+   The data collection must have used -b (or -j) and -g.
+
 --objdump=::
 Path to objdump binary.
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 4dcb4db..c2dc8f27 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -220,8 +220,9 @@ static int report__setup_sample_type(struct report *rep)
return -EINVAL;
}
if (symbol_conf.use_callchain) {
-   ui__error("Selected -g but no callchain data. Did "
-   "you call 'perf record' without -g?\n");
+   ui__error("Selected -g or --branch-history but no "
+ "callchain data. Did\n"
+ "you call 'perf record' without -g?\n");
return -1;
}
} else if (!rep->dont_use_callchains &&
@@ -544,6 +545,16 @@ parse_branch_mode(const struct option *opt __maybe_unused,
 }
 
 static int
+parse_branch_call_mode(const struct option *opt __maybe_unused,
+ const char *str __maybe_unused, int unset)
+{
+   int *branch_mode = opt->value;
+
+   *branch_mode = !unset;
+   return 0;
+}
+
+static int
 parse_percent_limit(const struct option *opt, const char *str,
int unset __maybe_unused)
 {
@@ -558,7 +569,7 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __maybe_unused)
struct perf_session *session;
struct stat st;
bool has_br_stack = false;
-   int branch_mode = -1;
+   int branch_mode = -1, branch_call_mode = -1;
int ret = -1;
char callchain_default_opt[] = "fractal,0.5,callee";
const char * const report_usage[] = {
@@ -669,7 +680,11 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __maybe_unused)
OPT_BOOLEAN(0, "group", _conf.event_group,
"Show event group information together"),
OPT_CALLBACK_NOOPT('b', "branch-stack", _mode, "",
-   "use branch records for histogram filling", 
parse_branch_mode),
+   "use branch records for per branch histogram filling",
+   parse_branch_mode),
+   OPT_CALLBACK_NOOPT(0, "branch-history", _call_mode, "",
+   "add last branch records to call history",
+   parse_branch_call_mode),
OPT_STRING(0, "objdump", _path, "path",
   "objdump binary to use for disassembly and annotations"),
OPT_BOOLEAN(0, "demangle", _conf.demangle,
@@ -719,10 +734,19 @@ repeat:
has_br_stack = perf_header__has_feat(>header,
 HEADER_BRANCH_STACK);
 
-   if (branch_mode == -1 && has_br_stack) {
+   if (branch_mode == -1 && has_br_stack && branch_call_mode == -1) {
sort__mode = SORT_MODE__BRANCH;
symbol_conf.cumulate_callchain = false;
}
+   if (branch_call_mode != -1) {
+   callchain_param.branch_callstack = 1;
+   callchain_param.key = CCKEY_ADDRESS;
+   symbol_conf.use_callchain = true;
+   callchain_register_param(_param);
+   if (sort_order == default_sort_order)
+   sort_order = "srcline,symbol,dso";
+   branch_mode = 0;
+   }
 
if (report.mem_mode) {
if (sort__mode == SORT_MODE__BRANCH) {
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index dee1695..ab04045 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1379,15 +1379,15 @@ static int machine__resolve_callchain_sample(struct 
machine *machine,
 * - No annotations (should annotate somehow)
 */
 
-   if (branch->nr > 

[PATCH 1/9] perf, tools: Support handling complete branch stacks as histograms v7

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.

This implements an alternative mode that creates histograms over complete
branch traces, instead of individual branches, similar to how normal
callgraphs are handled. This is done by putting it in
front of the normal callgraph and then using the normal callgraph
histogram infrastructure to unify them.

This way in complex functions we can understand the control flow
that lead to a particular sample, and may even see some control
flow in the caller for short functions.

Example (simplified, of course for such simple code this
is usually not needed):

tcall.c:

volatile a = 1, b = 10, c;

__attribute__((noinline)) f2()
{
c = a / b;
}

__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 100; i++)
f1();
}

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --branch-history
...
54.91%  tcall.c:6  [.] f2  tcall
|
|--65.53%-- f2 tcall.c:5
|  |
|  |--70.83%-- f1 tcall.c:11
|  |  f1 tcall.c:10
|  |  main tcall.c:18
|  |  main tcall.c:18
|  |  main tcall.c:17
|  |  main tcall.c:17
|  |  f1 tcall.c:13
|  |  f1 tcall.c:13
|  |  f2 tcall.c:7
|  |  f2 tcall.c:5
|  |  f1 tcall.c:12
|  |  f1 tcall.c:12
|  |  f2 tcall.c:7
|  |  f2 tcall.c:5
|  |  f1 tcall.c:11
|  |
|   --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12

The default output is unchanged.

This is only implemented in perf report, no change to record
or anywhere else.

This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the difference
between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR

Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)

v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |   7 +-
 tools/perf/builtin-report.c  |   4 +-
 tools/perf/util/callchain.c  |  11 ++-
 tools/perf/util/callchain.h  |   1 +
 tools/perf/util/machine.c| 159 +++
 tools/perf/util/symbol.h |   3 +-
 6 files changed, 158 insertions(+), 27 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index cefdf43..29a21b0 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -143,7 +143,7 @@ OPTIONS
 --dump-raw-trace::
 Dump raw trace in ASCII.
 
--g [type,min[,limit],order[,key]]::
+-g [type,min[,limit],order[,key][,branch]]::
 --call-graph::
 Display call chains using type, min percent threshold, optional print
limit and order.
@@ -161,6 +161,11 @@ OPTIONS
- function: compare on functions
- address: compare on individual code addresses
 
+   branch can be:
+   - branch: include last branch information in callgraph
+   when available. Usually more convenient to use --branch-history
+   for this.
+
Default: fractal,0.5,callee,function.
 
 

[PATCH 4/9] perf, tools: Only print base source file for srcline

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

For perf report with --sort srcline only print the base source file
name. This makes the results generally fit much better to the
screen. The path is usually not that useful anyways because it is
often from different systems.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/srcline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index c6a7cdc..ac877f9 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -274,7 +274,7 @@ char *get_srcline(struct dso *dso, unsigned long addr)
if (!addr2line(dso_name, addr, , , dso))
goto out;
 
-   if (asprintf(, "%s:%u", file, line) < 0) {
+   if (asprintf(, "%s:%u", basename(file), line) < 0) {
free(file);
goto out;
}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


perf: Implement lbr-as-callgraph v8

2014-06-20 Thread Andi Kleen
[Even more review feedback and some bugs addressed.]
[Only port to changes in perf/core. No other changes.]
[Rebase to latest perf/core]
[Another rebase. No changes]

This patchkit implements lbr-as-callgraphs in per freport,
as an alternative way to present LBR information.

Current perf report does a histogram over the branch edges,
which is useful to look at basic blocks, but doesn't tell
you anything about the larger control flow behaviour.

This patchkit adds a new option --branch-history that
adds the branch paths to the callgraph history instead.

This allows to reason about individual branch paths leading
to specific samples.

Also available at
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/lbr-callgraph5

v2:
- rebased on perf/core
- fix various issues
- rename the option to --branch-history
- various fixes to display the information more concise
v3:
- White space changes
- Consolidate some patches
- Update some descriptions
v4:
- Fix various display problems
- Unknown srcline is now printed as symbol+offset
- Refactor some code to address review feedback
- Merge with latest tip
- Fix missing srcline display in stdio hist output.
v5:
- Rename functions
- Fix gtk build problem
- Fix crash without -g
- Improve error messages
- Improve srcline display in various ways
v6:
- Port to latest perf/core
v7:
- Really port to latest perf/core
v8:
- Rebased on 3.16-rc1


Example output:

% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --branch-history
...
54.91%  tcall.c:6  [.] f2  tcall
|
|--65.53%-- f2 tcall.c:5
|  |
|  |--70.83%-- f1 tcall.c:11
|  |  f1 tcall.c:10
|  |  main tcall.c:18
|  |  main tcall.c:18
|  |  main tcall.c:17
|  |  main tcall.c:17
|  |  f1 tcall.c:13
|  |  f1 tcall.c:13
|  |  f2 tcall.c:7
|  |  f2 tcall.c:5
|  |  f1 tcall.c:12
|  |  f1 tcall.c:12
|  |  f2 tcall.c:7
|  |  f2 tcall.c:5
|  |  f1 tcall.c:11


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/9] perf, tools: Support source line numbers in annotate

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

With srcline key/sort'ing it's useful to have line numbers
in the annotate window. This patch implements this.

Use objdump -l to request the line numbers and
save them in the line structure. Then the browser
displays them for source lines.

The line numbers are not displayed by default, but can be
toggled on with 'k'

There is one unfortunate problem with this setup. For
lines not containing source and which are outside functions
objdump -l reports line numbers off by a few: it always reports
the first line number in the next function even for lines
that are outside the function.
I haven't found a nice way to detect/correct this. Probably objdump
has to be fixed.
See https://sourceware.org/bugzilla/show_bug.cgi?id=16433

The line numbers are still useful even with these problems,
as most are correct and the ones which are not are nearby.

Signed-off-by: Andi Kleen 
---
 tools/perf/ui/browsers/annotate.c | 13 -
 tools/perf/util/annotate.c| 30 +-
 tools/perf/util/annotate.h|  1 +
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index f0697a3..8df6787 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -27,6 +27,7 @@ static struct annotate_browser_opt {
bool hide_src_code,
 use_offset,
 jump_arrows,
+show_linenr,
 show_nr_jumps;
 } annotate_browser__opts = {
.use_offset = true,
@@ -128,7 +129,11 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
if (!*dl->line)
slsmg_write_nstring(" ", width - pcnt_width);
else if (dl->offset == -1) {
-   printed = scnprintf(bf, sizeof(bf), "%*s  ",
+   if (dl->line_nr && annotate_browser__opts.show_linenr)
+   printed = scnprintf(bf, sizeof(bf), "%*s %-5d ",
+   ab->addr_width, " ", dl->line_nr);
+   else
+   printed = scnprintf(bf, sizeof(bf), "%*s  ",
ab->addr_width, " ");
slsmg_write_nstring(bf, printed);
slsmg_write_nstring(dl->line, width - printed - pcnt_width + 1);
@@ -733,6 +738,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
"o Toggle disassembler output/simplified view\n"
"s Toggle source code view\n"
"/ Search string\n"
+   "k Toggle line numbers\n"
"r Run available scripts\n"
"? Search string backwards\n");
continue;
@@ -741,6 +747,10 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
script_browse(NULL);
continue;
}
+   case 'k':
+   annotate_browser__opts.show_linenr =
+   !annotate_browser__opts.show_linenr;
+   break;
case 'H':
nd = browser->curr_hot;
break;
@@ -984,6 +994,7 @@ static struct annotate_config {
 } annotate__configs[] = {
ANNOTATE_CFG(hide_src_code),
ANNOTATE_CFG(jump_arrows),
+   ANNOTATE_CFG(show_linenr),
ANNOTATE_CFG(show_nr_jumps),
ANNOTATE_CFG(use_offset),
 };
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 809b4c5..12997ff 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -17,11 +17,13 @@
 #include "debug.h"
 #include "annotate.h"
 #include "evsel.h"
+#include 
 #include 
 #include 
 
 const char *disassembler_style;
 const char *objdump_path;
+static regex_t  file_lineno;
 
 static struct ins *ins__find(const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
@@ -564,13 +566,15 @@ out_free_name:
return -1;
 }
 
-static struct disasm_line *disasm_line__new(s64 offset, char *line, size_t 
privsize)
+static struct disasm_line *disasm_line__new(s64 offset, char *line,
+   size_t privsize, int line_nr)
 {
struct disasm_line *dl = zalloc(sizeof(*dl) + privsize);
 
if (dl != NULL) {
dl->offset = offset;
dl->line = strdup(line);
+   dl->line_nr = line_nr;
if (dl->line == NULL)
goto out_delete;
 
@@ -782,13 +786,15 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
  * The ops.raw part will be parsed further according to type of the 
instruction.
  */
 static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
- FILE *file, size_t 

[PATCH 6/9] perf, tools: Fix srcline sort key output to use width

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

The srcline sort output ignored the width, which caused
various problems with displaying srcline in the tui
browser. Just cut it off at width.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 45512ba..901f44b 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -304,7 +304,7 @@ static int hist_entry__srcline_snprintf(struct hist_entry 
*he, char *bf,
size_t size,
unsigned int width __maybe_unused)
 {
-   return repsep_snprintf(bf, size, "%s", he->srcline);
+   return repsep_snprintf(bf, size, "%.*s", width, he->srcline);
 }
 
 struct sort_entry sort_srcline = {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/9] tools, perf: Make srcline output address with -v

2014-06-20 Thread Andi Kleen
From: Andi Kleen 

When -v is specified always print the hex address for the srcline.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/srcline.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 36a7aff..a22be7c 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -258,6 +258,12 @@ char *get_srcline(struct dso *dso, unsigned long addr, 
struct symbol *sym,
unsigned line = 0;
char *srcline;
const char *dso_name;
+   char astr[50];
+
+   if (verbose)
+   snprintf(astr, sizeof astr, " %#lx", addr);
+   else
+   astr[0] = 0;
 
if (!dso->has_srcline)
goto out;
@@ -276,7 +282,12 @@ char *get_srcline(struct dso *dso, unsigned long addr, 
struct symbol *sym,
if (!addr2line(dso_name, addr, , , dso))
goto out;
 
-   if (asprintf(, "%s:%u", basename(file), line) < 0) {
+   if (line == 0) {
+   free(file);
+   goto fallback;
+   }
+
+   if (asprintf(, "%s:%u%s", basename(file), line, astr) < 0) {
free(file);
goto out;
}
@@ -291,9 +302,10 @@ out:
dso->has_srcline = 0;
dso__free_a2l(dso);
}
+fallback:
if (sym) {
-   if (asprintf(, "%s+%ld", show_sym ? sym->name : "",
-   addr - sym->start) < 0)
+   if (asprintf(, "%s+%ld%s", show_sym ? sym->name : "",
+   addr - sym->start, astr) < 0)
return SRCLINE_UNKNOWN;
} else if (asprintf(, "%s[%lx]", dso->short_name, addr) < 0)
return SRCLINE_UNKNOWN;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] selinux: no recursive read_lock of policy_rwlock in security_genfs_sid()

2014-06-20 Thread Waiman Long

On 06/20/2014 01:49 PM, Stephen Smalley wrote:

On 06/20/2014 01:45 PM, Waiman Long wrote:

With introduction of fair queued rwlock, recursive read_lock() may hang
the offending process if there is a write_lock() somewhere in between.

With recursive read_lock checking enabled, the following error was
reported:

=
[ INFO: possible recursive locking detected ]
3.16.0-rc1 #2 Tainted: GE
-
load_policy/708 is trying to acquire lock:
  (policy_rwlock){.+.+..}, at: [] 
security_genfs_sid+0x3a/0x170

but task is already holding lock:
  (policy_rwlock){.+.+..}, at: [] security_fs_use+0x2c/0x110

other info that might help us debug this:
  Possible unsafe locking scenario:

CPU0

   lock(policy_rwlock);
   lock(policy_rwlock);

This patch fixes the occurrence of recursive read_lock() of
policy_rwlock in security_genfs_sid() by adding a 5th argument to
indicate if the rwlock has been taken.

Signed-off-by: Waiman Long

Thanks, but I'd prefer to instead create a static helper function in
services.c that does not take the lock at all, use that function from
security_fs_use, and leave the extern function unmodified.


On second thought, this is exactly what I want to change the patch. I 
will send out a new one later today.


-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv6 1/3] devicetree: Addition of the Altera SDRAM controller

2014-06-20 Thread tthayer
From: Thor Thayer 

Addition of the Altera SDRAM Controller bindings and device tree changes.

v2: Changes to SoC SDRAM EDAC code.

v3: Implement code suggestions for SDRAM EDAC code.

v4: Remove syscon from SDRAM controller bindings.

v5: No Change, bump version for consistency.

v6: Only map the ctrlcfg register as syscon.

Signed-off-by: Thor Thayer 
---
 .../bindings/arm/altera/socfpga-sdram.txt  |   11 +++
 arch/arm/boot/dts/socfpga.dtsi |5 +
 2 files changed, 16 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt

diff --git a/Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt 
b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt
new file mode 100644
index 000..5027026
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt
@@ -0,0 +1,11 @@
+Altera SOCFPGA SDRAM Controller
+
+Required properties:
+- compatible : "altr,sdr-ctl";
+- reg : Should contain 1 register ranges(address and length)
+
+Example:
+   sdrctl@ffc25000 {
+   compatible = "altr,sdr-ctl";
+   reg = <0xffc25000 0x4>;
+   };
diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
index 4676f25..310292e 100644
--- a/arch/arm/boot/dts/socfpga.dtsi
+++ b/arch/arm/boot/dts/socfpga.dtsi
@@ -682,6 +682,11 @@
clocks = <_sp_clk>;
};
 
+   sdrctl@ffc25000 {
+   compatible = "altr,sdr-ctl", "syscon";
+   reg = <0xffc25000 0x4>;
+   };
+
rst: rstmgr@ffd05000 {
compatible = "altr,rst-mgr";
reg = <0xffd05000 0x1000>;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 0/5] Fix for cond_resched performance regression

2014-06-20 Thread Paul E. McKenney
On Fri, Jun 20, 2014 at 03:39:51PM -0700, j...@joshtriplett.org wrote:
> On Fri, Jun 20, 2014 at 03:11:20PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 20, 2014 at 02:24:23PM -0700, j...@joshtriplett.org wrote:
> > > On Fri, Jun 20, 2014 at 12:12:36PM -0700, Paul E. McKenney wrote:
> > > > o   Make cond_resched() a no-op for PREEMPT=y.  This might well turn
> > > > out to be a good thing, but it doesn't help give RCU the 
> > > > quiescent
> > > > states that it needs.
> > > 
> > > What about doing this, together with letting the fqs logic poke
> > > un-quiesced kernel code as needed?  That way, rather than having
> > > cond_resched do any work, you have the fqs logic recognize that a
> > > particular CPU has gone too long without quiescing, without disturbing
> > > that CPU at all if it hasn't gone too long.
> > 
> > My next stop is to post the previous series, but with a couple of
> > exports and one bug fix uncovered by testing thus far, but after
> > another round of testing.  Then I am going to take a close look at
> > this one:
> > 
> > o   Push the checks further into cond_resched(), so that the
> > fastpath does the same sequence of instructions that the original
> > did.  This might work well, but requires IPIs, which are not so
> > good for latencies on the remote CPU.  It nevertheless might be a
> > decent long-term solution given that if your CPU is spending many
> > jiffies looping in the kernel, you aren't getting good latencies
> > anyway.  It also has the benefit of allowing RCU to take advantage
> > of the implicit quiescent states of all cond_resched() calls,
> > and of eliminating the need for a separate cond_resched_rcu_qs()
> > and for RCU_COND_RESCHED_QS.
> > 
> > The one you call out is of course interesting as well.  But there are
> > a couple of questions:
> > 
> > 1.  Why wasn't cond_resched() a no-op in CONFIG_PREEMPT to start
> > with?  It just seems to obvious a thing to do for it to possibly
> > be an oversight.  (What, me paranoid?)
> > 
> > 2.  When RCU recognizes that a particular CPU has gone too long,
> > exactly what are you suggesting that RCU do about it?  When
> > formulating your answer, please give due consideration to the
> > implications of that CPU being a NO_HZ_FULL CPU.  ;-)
> 
> Send it an IPI that either causes it to flag a quiescent state
> immediately if currently quiesced or causes it to quiesce at the next
> opportunity if not.

OK.  But if we are in a !PREEMPT kernel, we have to assume that any point
in the kernel is not a quiescent state, at least for the rcu_read_lock()
flavor of RCU.  So in that case, what constitutes the set of next
opportunities, and what is the time bound on when the next opportunity
will arrive?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 2/4] Documentation: dts: Add bindings for APM X-Gene SoC ethernet driver

2014-06-20 Thread Iyappan Subramanian
This patch adds documentation for APM X-Gene SoC ethernet DTS binding.

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Ravi Patel 
Signed-off-by: Keyur Chudgar 
---
 .../devicetree/bindings/net/apm-xgene-enet.txt | 72 ++
 1 file changed, 72 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/apm-xgene-enet.txt

diff --git a/Documentation/devicetree/bindings/net/apm-xgene-enet.txt 
b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
new file mode 100644
index 000..3e2a295
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/apm-xgene-enet.txt
@@ -0,0 +1,72 @@
+APM X-Gene SoC Ethernet nodes
+
+Ethernet nodes are defined to describe on-chip ethernet interfaces in
+APM X-Gene SoC.
+
+Required properties:
+- compatible:  Should be "apm,xgene-enet"
+- reg: Address and length of the register set for the device. It contains the
+   information of registers in the same order as described by reg-names
+- reg-names: Should contain the register set names
+  "enet_csr":  Ethernet control and status register address space
+  "ring_csr":  Descriptor ring control and status register address 
space
+  "ring_cmd":  Descriptor ring command register address space
+- interrupts:  Ethernet main interrupt
+- clocks:  Reference to the clock entry.
+- local-mac-address:   MAC address assigned to this device
+- phy-connection-type: Interface type between ethernet device and PHY device
+- phy-handle:  Reference to a PHY node connected to this device
+
+- mdio:Device tree subnode with the following required
+   properties:
+
+   - compatible: Must be "apm,xgene-mdio".
+   - #address-cells: Must be <1>.
+   - #size-cells: Must be <0>.
+
+   For the phy on the mdio bus, there must be a node with the following
+   fields:
+
+   - compatible: PHY identifier.  Please refer ./phy.txt for the format.
+   - reg: The ID number for the phy.
+
+Optional properties:
+- status   : Should be "ok" or "disabled" for enabled/disabled.
+ Default is "ok".
+
+
+Example:
+   menetclk: menetclk {
+   compatible = "apm,xgene-device-clock";
+   clock-output-names = "menetclk";
+   status = "ok";
+   };
+
+   menet: ethernet@1702 {
+   compatible = "apm,xgene-enet";
+   status = "disabled";
+   reg = <0x0 0x1702 0x0 0xd100>,
+ <0x0 0X1703 0x0 0X400>,
+ <0x0 0X1000 0x0 0X200>;
+   reg-names = "enet_csr", "ring_csr", "ring_cmd";
+   interrupts = <0x0 0x3c 0x4>;
+   clocks = < 0>;
+   local-mac-address = [00 01 73 00 00 01];
+   phy-connection-type = "rgmii";
+   phy-handle = <>;
+   mdio {
+   compatible = "apm,xgene-mdio";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   menetphy: menetphy@3 {
+   compatible = "ethernet-phy-id001c.c915";
+   reg = <0x3>;
+   };
+
+   };
+   };
+
+/* Board-specific peripheral configurations */
+ {
+status = "ok";
+};
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 0/4] net: Add APM X-Gene SoC Ethernet driver support

2014-06-20 Thread Iyappan Subramanian
Adding APM X-Gene SoC Ethernet driver.

v8: Address comments from v7 review
* changed angle bracket to double quotes in header file include.

v7: Address comments from v6 review
* fixed skb memory leak when dma_map_single fails in xmit.

v6: Address comments from v5 review
* added basic ethtool support
* added ndo_get_stats64 call back
* deleted priting Rx error messages
* renamed set_bits to xgene_set_bits to fix kbuild error (make ARCH=powerpc)

v5: Address comments from v4 review
* Documentation: Added phy-handle, reg-names and changed mdio part
* dtb: Added reg-names supplemental property
* changed platform_get_resource to platform_get_resource_byname
* added separate tx/rx set_desc/get_desc functions to do raw_write/raw_read
* removed set_desc/get_desc table lookup logic
* added error handling logic based on per packet descriptor bits
* added software managed Rx packet and error counters
* added busy wait for register read/writes
* changed mdio_bus->id to avoid conflict
* fixed mdio_bus leak in case of mdio_config error
* changed phy reg hard coded value to MII_BMSR
* changed phy addr hard coded value to phy_device->addr
* added paranthesis around macro arguments
* converted helper macros to inline functions
* changed use of goto's only to common work such as cleanup

v4: Address comments from v3 review
* MAINTAINERS: changed status to supported
* Kconfig: made default to no
* changed to bool data type wherever applicable
* cleaned up single bit set and masking code
* removed statistics counters masking
* removed unnecessary OOM message printing
* fixed dma_map_single and dma_unmap_single size parameter
* changed set bits macro body using new set_bits function

v3: Address comments from v2 review
* cleaned up set_desc and get_desc functions
* added dtb mdio node and phy-handle subnode
* renamed dtb phy-mode to phy-connection-type
* added of_phy_connect call to connec to PHY
* added empty line after last local variable declaration
* removed type casting when not required
* removed inline keyword from source files
* removed CONFIG_CPU_BIG_ENDIAN ifdef

v2
* Completely redesigned ethernet driver
* Added support to work with big endian kernel
* Renamed dtb phyid entry to phy_addr
* Changed dtb local-mac-address entry to byte string format
* Renamed dtb eth8clk entry to menetclk

v1
* Initial version

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Ravi Patel 
Signed-off-by: Keyur Chudgar 
---

Iyappan Subramanian (4):
  MAINTAINERS: Add entry for APM X-Gene SoC ethernet driver
  Documentation: dts: Add bindings for APM X-Gene SoC ethernet driver
  dts: Add bindings for APM X-Gene SoC ethernet driver
  drivers: net: Add APM X-Gene SoC ethernet driver support.

 .../devicetree/bindings/net/apm-xgene-enet.txt |  72 ++
 MAINTAINERS|   8 +
 arch/arm64/boot/dts/apm-mustang.dts|   4 +
 arch/arm64/boot/dts/apm-storm.dtsi |  30 +-
 drivers/net/ethernet/Kconfig   |   1 +
 drivers/net/ethernet/Makefile  |   1 +
 drivers/net/ethernet/apm/Kconfig   |   1 +
 drivers/net/ethernet/apm/Makefile  |   5 +
 drivers/net/ethernet/apm/xgene/Kconfig |   9 +
 drivers/net/ethernet/apm/xgene/Makefile|   6 +
 .../net/ethernet/apm/xgene/xgene_enet_ethtool.c| 125 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 848 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h | 394 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   | 939 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h   | 109 +++
 15 files changed, 2549 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/apm-xgene-enet.txt
 create mode 100644 drivers/net/ethernet/apm/Kconfig
 create mode 100644 drivers/net/ethernet/apm/Makefile
 create mode 100644 drivers/net/ethernet/apm/xgene/Kconfig
 create mode 100644 drivers/net/ethernet/apm/xgene/Makefile
 create mode 100644 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
 create mode 100644 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
 create mode 100644 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
 create mode 100644 drivers/net/ethernet/apm/xgene/xgene_enet_main.c
 create mode 100644 drivers/net/ethernet/apm/xgene/xgene_enet_main.h

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 1/4] MAINTAINERS: Add entry for APM X-Gene SoC ethernet driver

2014-06-20 Thread Iyappan Subramanian
This patch adds a MAINTAINERS entry for APM X-Gene SoC
ethernet driver.

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Ravi Patel 
Signed-off-by: Keyur Chudgar 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 134483f..d65a3be 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -700,6 +700,14 @@ S: Maintained
 F: drivers/net/appletalk/
 F: net/appletalk/
 
+APPLIED MICRO (APM) X-GENE SOC ETHERNET DRIVER
+M: Iyappan Subramanian 
+M: Keyur Chudgar 
+M: Ravi Patel 
+S: Supported
+F: drivers/net/ethernet/apm/xgene/
+F: Documentation/devicetree/bindings/net/apm-xgene-enet.txt
+
 APTINA CAMERA SENSOR PLL
 M: Laurent Pinchart 
 L: linux-me...@vger.kernel.org
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v8 3/4] dts: Add bindings for APM X-Gene SoC ethernet driver

2014-06-20 Thread Iyappan Subramanian
This patch adds bindings for APM X-Gene SoC ethernet driver.

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Ravi Patel 
Signed-off-by: Keyur Chudgar 
---
 arch/arm64/boot/dts/apm-mustang.dts |  4 
 arch/arm64/boot/dts/apm-storm.dtsi  | 30 +++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/apm-mustang.dts 
b/arch/arm64/boot/dts/apm-mustang.dts
index 1247ca1..e2fb1ef 100644
--- a/arch/arm64/boot/dts/apm-mustang.dts
+++ b/arch/arm64/boot/dts/apm-mustang.dts
@@ -24,3 +24,7 @@
reg = < 0x1 0x 0x0 0x8000 >; /* Updated by 
bootloader */
};
 };
+
+ {
+   status = "ok";
+};
diff --git a/arch/arm64/boot/dts/apm-storm.dtsi 
b/arch/arm64/boot/dts/apm-storm.dtsi
index c5f0a47..bd7a614 100644
--- a/arch/arm64/boot/dts/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm-storm.dtsi
@@ -167,14 +167,13 @@
clock-output-names = "ethclk";
};
 
-   eth8clk: eth8clk {
+   menetclk: menetclk {
compatible = "apm,xgene-device-clock";
#clock-cells = <1>;
clocks = < 0>;
-   clock-names = "eth8clk";
reg = <0x0 0x1702C000 0x0 0x1000>;
reg-names = "csr-reg";
-   clock-output-names = "eth8clk";
+   clock-output-names = "menetclk";
};
 
sataphy1clk: sataphy1clk@1f21c000 {
@@ -363,5 +362,30 @@
#clock-cells = <1>;
clocks = < 0>;
};
+
+   menet: ethernet@1702 {
+   compatible = "apm,xgene-enet";
+   status = "disabled";
+   reg = <0x0 0x1702 0x0 0xd100>,
+ <0x0 0X1703 0x0 0X400>,
+ <0x0 0X1000 0x0 0X200>;
+   reg-names = "enet_csr", "ring_csr", "ring_cmd";
+   interrupts = <0x0 0x3c 0x4>;
+   dma-coherent;
+   clocks = < 0>;
+   local-mac-address = [00 01 73 00 00 01];
+   phy-connection-type = "rgmii";
+   phy-handle = <>;
+   mdio {
+   compatible = "apm,xgene-mdio";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   menetphy: menetphy@3 {
+   compatible = "ethernet-phy-id001c.c915";
+   reg = <0x3>;
+   };
+
+   };
+   };
};
 };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
On Sat, 2014-06-21 at 00:14 +1000, Alexey Kardashevskiy wrote:

> We can still use __raw_writel, would that be ok?

No unless you understand precisely what kind of memory barriers each
platform require for these.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv6 3/3] edac: altera: Add EDAC support for SDRAM Ctlr

2014-06-20 Thread tthayer
From: Thor Thayer 

Addition of the driver to support the Altera SDRAM Controller. 
This patch adds support for the CycloneV and ArriaV SDRAM controllers. 
Correction and reporting of SBEs, Panic on DBEs.

v2: Use the SDRAM controller registers to calculate memory size
instead of the Device Tree. Update To & Cc list. Add maintainer
information.

v3: EDAC driver cleanup based on comments from Mailing list.

v4: Panic on DBE. Add macro around inject-error reads to prevent
them from being optimized out. Remove of_match_ptr since this
will always use Device Tree.

v5: Addition of printk to trigger function to ensure read vars
are not optimized out.

v6: Changes to split out shared SDRAM controller reg (offset 0x00)
as a syscon device and allocate ECC specific SDRAM registers
to EDAC.

Signed-off-by: Thor Thayer 
---
 drivers/edac/Kconfig   |9 +
 drivers/edac/Makefile  |2 +
 drivers/edac/altera_edac.c |  448 
 3 files changed, 459 insertions(+)
 create mode 100644 drivers/edac/altera_edac.c

diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 878f090..4f4d379 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -368,4 +368,13 @@ config EDAC_OCTEON_PCI
  Support for error detection and correction on the
  Cavium Octeon family of SOCs.
 
+config EDAC_ALTERA_MC
+   bool "Altera SDRAM Memory Controller EDAC"
+   depends on EDAC_MM_EDAC && ARCH_SOCFPGA
+   help
+ Support for error detection and correction on the
+ Altera SDRAM memory controller. Note that the
+ preloader must initialize the SDRAM before loading
+ the kernel.
+
 endif # EDAC
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 4154ed6..9741336 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -64,3 +64,5 @@ obj-$(CONFIG_EDAC_OCTEON_PC)  += octeon_edac-pc.o
 obj-$(CONFIG_EDAC_OCTEON_L2C)  += octeon_edac-l2c.o
 obj-$(CONFIG_EDAC_OCTEON_LMC)  += octeon_edac-lmc.o
 obj-$(CONFIG_EDAC_OCTEON_PCI)  += octeon_edac-pci.o
+
+obj-$(CONFIG_EDAC_ALTERA_MC)   += altera_edac.o
diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
new file mode 100644
index 000..e3fcd27
--- /dev/null
+++ b/drivers/edac/altera_edac.c
@@ -0,0 +1,448 @@
+/*
+ *  Copyright Altera Corporation (C) 2014. All rights reserved.
+ *  Copyright 2011-2012 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+
+ *
+ * Adapted from the highbank_mc_edac driver
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "edac_core.h"
+#include "edac_module.h"
+
+#define EDAC_MOD_STR   "altera_edac"
+#define EDAC_VERSION   "1"
+
+/* SDRAM Controller CtrlCfg Register */
+#define CTLCFG 0x00
+
+/* SDRAM Controller CtrlCfg Register Bit Masks */
+#define CTLCFG_ECC_EN  0x400
+#define CTLCFG_ECC_CORR_EN 0x800
+#define CTLCFG_GEN_SB_ERR  0x2000
+#define CTLCFG_GEN_DB_ERR  0x4000
+
+#define CTLCFG_ECC_AUTO_EN (CTLCFG_ECC_EN | \
+CTLCFG_ECC_CORR_EN)
+
+/* SDRAM Controller ECC Register Offset */
+#define ECC_REG_OFFSET 0x2C
+
+/* SDRAM Controller Address Width Register */
+#define DRAMADDRW  (0x2C-ECC_REG_OFFSET)
+
+/* SDRAM Controller Address Widths Field Register */
+#define DRAMADDRW_COLBIT_MASK  0x001F
+#define DRAMADDRW_COLBIT_LSB   0
+#define DRAMADDRW_ROWBIT_MASK  0x03E0
+#define DRAMADDRW_ROWBIT_LSB   5
+#define DRAMADDRW_BANKBIT_MASK 0x1C00
+#define DRAMADDRW_BANKBIT_LSB  10
+#define DRAMADDRW_CSBIT_MASK   0xE000
+#define DRAMADDRW_CSBIT_LSB13
+
+/* SDRAM Controller Interface Data Width Register */
+#define DRAMIFWIDTH(0x30-ECC_REG_OFFSET)
+
+/* SDRAM Controller Interface Data Width Defines */
+#define DRAMIFWIDTH_16B_ECC24
+#define DRAMIFWIDTH_32B_ECC40
+
+/* SDRAM Controller DRAM Status Register */
+#define DRAMSTS(0x38-ECC_REG_OFFSET)
+
+/* SDRAM Controller DRAM Status Register Bit Masks */
+#define DRAMSTS_SBEERR 0x04
+#define DRAMSTS_DBEERR 0x08
+#define DRAMSTS_CORR_DROP  0x10
+
+/* SDRAM Controller DRAM IRQ Register */
+#define DRAMINTR   (0x3C-ECC_REG_OFFSET)
+
+/* SDRAM Controller DRAM IRQ 

Re: [PATCH v2] devicetree: Add generic IOMMU device tree bindings

2014-06-20 Thread Olav Haugan
On 5/30/2014 12:06 PM, Arnd Bergmann wrote:
> On Friday 30 May 2014 08:16:05 Rob Herring wrote:
>> On Fri, May 23, 2014 at 3:33 PM, Thierry Reding
>>  wrote:
>>> From: Thierry Reding 
>>> +IOMMU master node:
>>> +==
>>> +
>>> +Devices that access memory through an IOMMU are called masters. A device 
>>> can
>>> +have multiple master interfaces (to one or more IOMMU devices).
>>> +
>>> +Required properties:
>>> +
>>> +- iommus: A list of phandle and IOMMU specifier pairs that describe the 
>>> IOMMU
>>> +  master interfaces of the device. One entry in the list describes one 
>>> master
>>> +  interface of the device.
>>> +
>>> +When an "iommus" property is specified in a device tree node, the IOMMU 
>>> will
>>> +be used for address translation. If a "dma-ranges" property exists in the
>>> +device's parent node it will be ignored. An exception to this rule is if 
>>> the
>>> +referenced IOMMU is disabled, in which case the "dma-ranges" property of 
>>> the
>>> +parent shall take effect.
>>
>> Just thinking out loud, could you have dma-ranges in the iommu node
>> for the case when the iommu is enabled rather than putting the DMA
>> window information into the iommus property?
>>
>> This would probably mean that you need both #iommu-cells and #address-cells.
> 
> The reason for doing like this was that you may need a different window
> for each device, while there can only be one dma-ranges property in 
> an iommu node.
> 
>>> +
>>> +Optional properties:
>>> +
>>> +- iommu-names: A list of names identifying each entry in the "iommus"
>>> +  property.
>>
>> Do we really need a name here? I would not expect that you have
>> clearly documented names here from the datasheet like you would for
>> interrupts or clocks, so you'd just be making up names. Sorry, but I'm
>> not a fan of names properties in general.
> 
> Good point, this was really overdesign by modeling it after other
> subsystems that can have a use for names.
>  
>>> +Multiple-master IOMMU:
>>> +--
>>> +
>>> +   iommu {
>>> +   /* the specifier represents the ID of the master */
>>> +   #address-cells = <1>;
>>> +   #size-cells = <0>;
>>> +   };
>>> +
>>> +   master {
>>> +   /* device has master ID 42 in the IOMMU */
>>> +   iommus = <&/iommu 42>;
>>> +   };
>>
>> Presumably the ID would be the streamID on ARM's SMMU. How would a
>> master with 8 streamIDs be described? This is what Calxeda midway has
>> for SATA and I would expect that to be somewhat common. Either you
>> need some ID masking or you'll have lots of duplication when you have
>> windows.
> 
> I don't understand the problem. If you have stream IDs 0 through 7,
> you would have
> 
>   master@a {
>   ...
>   iommus = < 0>;
>   };
> 
>   master@b {
>   ...
>   iommus = < 1;
>   };
> 
>   ...
> 
>   master@12 {
>   ...
>   iommus = < 7;
>   };
> 
> and you don't need a window at all. Why would you need a mask of
> some sort?

We have multiple-master SMMUs and each master emits a variable number of
StreamIDs. However, we have to apply a mask (the ARM SMMU spec allows
for this) to the StreamIDs due to limited number of StreamID 2 Context
Bank entries in the SMMU. If my understanding is correct we would
represent this in the DT like this:

iommu {
#address-cells = <2>;
#size-cells = <0>;
};

master@a {
...
iommus = < StreamID0 MASK0>,
 < StreamID1 MASK1>,
 < StreamID2 MASK2>;
};

master@b {
...
iommus = < StreamID3 MASK3>,
 < StreamID4 MASK4>;
};


Thanks,

Olav Haugan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv6 2/3] devicetree: Addition of the Altera SDRAM EDAC

2014-06-20 Thread tthayer
From: Thor Thayer 

Addition of the Altera SDRAM EDAC bindings and device tree changes

v2: Changes to SoC EDAC source code.

v3: Fix typo in device tree documentation.

v4,v5: No changes - bump version for consistency.

v6: Assign ECC registers in SDRAM controller to EDAC

Signed-off-by: Thor Thayer 
---
 .../bindings/arm/altera/socfpga-sdram-edac.txt |   15 +++
 arch/arm/boot/dts/socfpga.dtsi |6 ++
 2 files changed, 21 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt

diff --git 
a/Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt 
b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt
new file mode 100644
index 000..540c9cf
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt
@@ -0,0 +1,15 @@
+Altera SOCFPGA SDRAM Error Detection & Correction [EDAC]
+
+Required properties:
+- compatible : should contain "altr,sdram-edac";
+- reg : should contain the ECC register range in sdram
+controller (address and length).
+- interrupts : Should contain the SDRAM ECC IRQ in the
+   appropriate format for the IRQ controller.
+
+Example:
+   sdramedac@0 {
+   compatible = "altr,sdram-edac";
+   reg = <0xffc2502C 0x28>;
+   interrupts = <0 39 4>;
+   };
diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
index 310292e..fe9832e 100644
--- a/arch/arm/boot/dts/socfpga.dtsi
+++ b/arch/arm/boot/dts/socfpga.dtsi
@@ -687,6 +687,12 @@
reg = <0xffc25000 0x4>;
};
 
+   sdramedac@0 {
+   compatible = "altr,sdram-edac";
+   reg = <0xffc2502C 0x28>;
+   interrupts = <0 39 4>;
+   };
+
rst: rstmgr@ffd05000 {
compatible = "altr,rst-mgr";
reg = <0xffd05000 0x1000>;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2c: exynos5: Properly use the "noirq" variants of suspend/resume

2014-06-20 Thread Kevin Hilman
Doug Anderson  writes:

> Kevin,
>
> On Fri, Jun 20, 2014 at 2:48 PM, Kevin Hilman  wrote:
>> Hi Doug,
>>
>> Doug Anderson  writes:
>>
>>> On Thu, Jun 19, 2014 at 11:43 AM, Kevin Hilman  wrote:
 Doug Anderson  writes:

> The original code for the exynos i2c controller registered for the
> "noirq" variants.  However during review feedback it was moved to
> SIMPLE_DEV_PM_OPS without anyone noticing that it meant we were no
> longer actually "noirq" (despite functions named
> exynos5_i2c_suspend_noirq and exynos5_i2c_resume_noirq).
>
> i2c controllers that might have wakeup sources on them seem to need to
> resume at noirq time so that the individual drivers can actually read
> the i2c bus to handle their wakeup.

 I suspect usage of the noirq variants pre-dates the existence of the
 late/early callbacks in the PM core, but based on the description above,
 I suspect what you actually want is the late/early callbacks.
>>>
>>> I think it actually really needs noirq.  ;)
>>
>> Yes, it appears it does.   Objection withdrawn.
>>
>> I just wanted to be sure because since the introduction of late/early,
>> the need for noirq should be pretty rare, but there certainly are needs.
>>
>> 
>> In this case though, the need for it has more to do with the
>> lack of a way for us to describe non parent-child device dependencies
>> than whether or not IRQs are enabled or not.
>> 
>
> Actually, I'm not sure that's true, but I'll talk through it and you
> can point to where I'm wrong (I often am!)
>
> If you're a wakeup device then you need to be ready to handle
> interrupts as soon as the "noirq" phase of resume is done, right?

As soon as the noirq phase of your own driver is done, correct.

> Said another way: you need to be ready to handle interrupts _before_
> the normal resume code is called and be ready to handle interrupts
> even _before_ the early resume code is called.

Correct.

> That means if you are implementing a bus that's needed by any devices
> with wakeup interrupts then it's your responsibility to also be
> prepared to run this early.
>
> In this particular case the max77686 driver doesn't need to do
> anything at all to be ready to handle interrupts.  It's suspend and
> resume code is just boilerplate "enable wakeups / disable wakeups" and
> it has no "noirq" code.  The max77686 driver doesn't have any "noirq"
> wake call because it would just be empty.
>
> Said another way: the problem isn't that the max77686 wakeup gets
> called before the i2c wakeup.  The problem is that i2c is needed ASAP
> once IRQs are enabled and thus needs to be run noirq.
>
> Does that sound semi-correct?

Yes that's correct.

My point above was (trying to be) that ultimately this is an ordering
issue.  e.g. the bus device needs to be "ready" before wakeup devices on
that bus can handle wakeup interrupts etc.  The way we're handling that
ordering is by the implied ordering of noirq, late/early and "normal"
callbacks.  That's convenient, but not exactly obvious.   

It works because we dont' typically need too many layers here, but it
would be much more understandable if we could describe this kind of
dependency in a way that the suspend/resume code would suspend/resume
things in the right order rather than by tinkering with callback levels
(since otherwise suspend/resume ordering just depends on probe order.)

This issue then usually gets me headed down my usual rant path about how
I think runtime PM is much better suited for handling ordering and
dependencies becuase it automatically handles parent/child dependencies
and non parent/child dependencies can be handled by taking advantage of
the get/put APIs which are refcounted, ect etc. but that's another can
worms.

Kevin







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vfio: Fix endianness handling for emulated BARs

2014-06-20 Thread Benjamin Herrenschmidt
On Thu, 2014-06-19 at 21:21 -0600, Alex Williamson wrote:

> Working on big endian being an accident may be a matter of perspective

 :-)

> The comment remains that this patch doesn't actually fix anything except
> the overhead on big endian systems doing redundant byte swapping and
> maybe the philosophy that vfio regions are little endian.

Yes, that works by accident because technically VFIO is a transport and
thus shouldn't perform any endian swapping of any sort, which remains
the responsibility of the end driver which is the only one to know
whether a given BAR location is a a register or some streaming data
and in the former case whether it's LE or BE (some PCI devices are BE
even ! :-)

But yes, in the end, it works with the dual "cancelling" swaps and the
overhead of those swaps is probably drowned in the noise of the syscall
overhead.

> I'm still not a fan of iowrite vs iowritebe, there must be something we
> can use that doesn't have an implicit swap.

Sadly there isn't ... In the old day we didn't even have the "be"
variant and readl/writel style accessors still don't have them either
for all archs.

There is __raw_readl/writel but here the semantics are much more than
just "don't swap", they also don't have memory barriers (which means
they are essentially useless to most drivers unless those are platform
specific drivers which know exactly what they are doing, or in the rare
cases such as accessing a framebuffer which we know never have side
effects). 

>  Calling it iowrite*_native is also an abuse of the namespace.


>  Next thing we know some common code
> will legitimately use that name. 

I might make sense to those definitions into a common header. There have
been a handful of cases in the past that wanted that sort of "native
byte order" MMIOs iirc (though don't ask me for examples, I can't really
remember).

>  If we do need to define an alias
> (which I'd like to avoid) it should be something like vfio_iowrite32.
> Thanks,

Cheers,
Ben.

> Alex
> 
> > > ===
> > > 
> > > any better?
> > > 
> > > 
> > > 
> > > 
> >  Suggested-by: Benjamin Herrenschmidt 
> >  Signed-off-by: Alexey Kardashevskiy 
> >  ---
> >   drivers/vfio/pci/vfio_pci_rdwr.c | 20 
> >   1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> >  diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c 
> >  b/drivers/vfio/pci/vfio_pci_rdwr.c
> >  index 210db24..f363b5a 100644
> >  --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> >  +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> >  @@ -21,6 +21,18 @@
> >   
> >   #include "vfio_pci_private.h"
> >   
> >  +#ifdef __BIG_ENDIAN__
> >  +#define ioread16_native   ioread16be
> >  +#define ioread32_native   ioread32be
> >  +#define iowrite16_native  iowrite16be
> >  +#define iowrite32_native  iowrite32be
> >  +#else
> >  +#define ioread16_native   ioread16
> >  +#define ioread32_native   ioread32
> >  +#define iowrite16_native  iowrite16
> >  +#define iowrite32_native  iowrite32
> >  +#endif
> >  +
> >   /*
> >    * Read or write from an __iomem region (MMIO or I/O port) with an 
> >  excluded
> >    * range which is inaccessible.  The excluded range drops writes and 
> >  fills
> >  @@ -50,9 +62,9 @@ static ssize_t do_io_rw(void __iomem *io, char 
> >  __user *buf,
> > if (copy_from_user(, buf, 4))
> > return -EFAULT;
> >   
> >  -  iowrite32(le32_to_cpu(val), io + off);
> >  +  iowrite32_native(val, io + off);
> > } else {
> >  -  val = cpu_to_le32(ioread32(io + off));
> >  +  val = ioread32_native(io + off);
> >   
> > if (copy_to_user(buf, , 4))
> > return -EFAULT;
> >  @@ -66,9 +78,9 @@ static ssize_t do_io_rw(void __iomem *io, char 
> >  __user *buf,
> > if (copy_from_user(, buf, 2))
> > return -EFAULT;
> >   
> >  -  iowrite16(le16_to_cpu(val), io + off);
> >  +  iowrite16_native(val, io + off);
> > } else {
> >  -  val = cpu_to_le16(ioread16(io + off));
> >  +  val = ioread16_native(io + off);
> >   
> > if (copy_to_user(buf, , 2))
> > return -EFAULT;
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > > 
> > > 
> > 
> > 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to 

[PATCHv6 3/3] edac: altera: Add EDAC support for SDRAM Ctlr

2014-06-20 Thread tthayer
From: Thor Thayer 

v2: Use the SDRAM controller registers to calculate memory size
instead of the Device Tree. Update To & Cc list. Add maintainer
information.

v3: EDAC driver cleanup based on comments from Mailing list.

v4: Panic on DBE. Add macro around inject-error reads to prevent
them from being optimized out. Remove of_match_ptr since this
will always use Device Tree.

v5: Addition of printk to trigger function to ensure read vars
are not optimized out.

v6: Changes to split out shared SDRAM controller reg (offset 0x00)
as a syscon device and allocate ECC specific SDRAM registers
to EDAC.

Signed-off-by: Thor Thayer 
---
 drivers/edac/Kconfig   |9 +
 drivers/edac/Makefile  |2 +
 drivers/edac/altera_edac.c |  448 
 3 files changed, 459 insertions(+)
 create mode 100644 drivers/edac/altera_edac.c

diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 878f090..4f4d379 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -368,4 +368,13 @@ config EDAC_OCTEON_PCI
  Support for error detection and correction on the
  Cavium Octeon family of SOCs.
 
+config EDAC_ALTERA_MC
+   bool "Altera SDRAM Memory Controller EDAC"
+   depends on EDAC_MM_EDAC && ARCH_SOCFPGA
+   help
+ Support for error detection and correction on the
+ Altera SDRAM memory controller. Note that the
+ preloader must initialize the SDRAM before loading
+ the kernel.
+
 endif # EDAC
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 4154ed6..9741336 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -64,3 +64,5 @@ obj-$(CONFIG_EDAC_OCTEON_PC)  += octeon_edac-pc.o
 obj-$(CONFIG_EDAC_OCTEON_L2C)  += octeon_edac-l2c.o
 obj-$(CONFIG_EDAC_OCTEON_LMC)  += octeon_edac-lmc.o
 obj-$(CONFIG_EDAC_OCTEON_PCI)  += octeon_edac-pci.o
+
+obj-$(CONFIG_EDAC_ALTERA_MC)   += altera_edac.o
diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c
new file mode 100644
index 000..e3fcd27
--- /dev/null
+++ b/drivers/edac/altera_edac.c
@@ -0,0 +1,448 @@
+/*
+ *  Copyright Altera Corporation (C) 2014. All rights reserved.
+ *  Copyright 2011-2012 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+
+ *
+ * Adapted from the highbank_mc_edac driver
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "edac_core.h"
+#include "edac_module.h"
+
+#define EDAC_MOD_STR   "altera_edac"
+#define EDAC_VERSION   "1"
+
+/* SDRAM Controller CtrlCfg Register */
+#define CTLCFG 0x00
+
+/* SDRAM Controller CtrlCfg Register Bit Masks */
+#define CTLCFG_ECC_EN  0x400
+#define CTLCFG_ECC_CORR_EN 0x800
+#define CTLCFG_GEN_SB_ERR  0x2000
+#define CTLCFG_GEN_DB_ERR  0x4000
+
+#define CTLCFG_ECC_AUTO_EN (CTLCFG_ECC_EN | \
+CTLCFG_ECC_CORR_EN)
+
+/* SDRAM Controller ECC Register Offset */
+#define ECC_REG_OFFSET 0x2C
+
+/* SDRAM Controller Address Width Register */
+#define DRAMADDRW  (0x2C-ECC_REG_OFFSET)
+
+/* SDRAM Controller Address Widths Field Register */
+#define DRAMADDRW_COLBIT_MASK  0x001F
+#define DRAMADDRW_COLBIT_LSB   0
+#define DRAMADDRW_ROWBIT_MASK  0x03E0
+#define DRAMADDRW_ROWBIT_LSB   5
+#define DRAMADDRW_BANKBIT_MASK 0x1C00
+#define DRAMADDRW_BANKBIT_LSB  10
+#define DRAMADDRW_CSBIT_MASK   0xE000
+#define DRAMADDRW_CSBIT_LSB13
+
+/* SDRAM Controller Interface Data Width Register */
+#define DRAMIFWIDTH(0x30-ECC_REG_OFFSET)
+
+/* SDRAM Controller Interface Data Width Defines */
+#define DRAMIFWIDTH_16B_ECC24
+#define DRAMIFWIDTH_32B_ECC40
+
+/* SDRAM Controller DRAM Status Register */
+#define DRAMSTS(0x38-ECC_REG_OFFSET)
+
+/* SDRAM Controller DRAM Status Register Bit Masks */
+#define DRAMSTS_SBEERR 0x04
+#define DRAMSTS_DBEERR 0x08
+#define DRAMSTS_CORR_DROP  0x10
+
+/* SDRAM Controller DRAM IRQ Register */
+#define DRAMINTR   (0x3C-ECC_REG_OFFSET)
+
+/* SDRAM Controller DRAM IRQ Register Bit Masks */
+#define DRAMINTR_INTREN0x01
+#define DRAMINTR_SBEMASK   0x02
+#define DRAMINTR_DBEMASK   0x04
+#define DRAMINTR_CORRDROPMASK  0x08
+#define 

[PATCHv6 2/3] devicetree: Addition of the Altera SDRAM EDAC

2014-06-20 Thread tthayer
From: Thor Thayer 

v2: Changes to SoC EDAC source code.

v3: Fix typo in device tree documentation.

v4,v5: No changes - bump version for consistency.

v6: Assign ECC registers in SDRAM controller to EDAC

Signed-off-by: Thor Thayer 
---
 .../bindings/arm/altera/socfpga-sdram-edac.txt |   15 +++
 arch/arm/boot/dts/socfpga.dtsi |6 ++
 2 files changed, 21 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt

diff --git 
a/Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt 
b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt
new file mode 100644
index 000..540c9cf
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt
@@ -0,0 +1,15 @@
+Altera SOCFPGA SDRAM Error Detection & Correction [EDAC]
+
+Required properties:
+- compatible : should contain "altr,sdram-edac";
+- reg : should contain the ECC register range in sdram
+controller (address and length).
+- interrupts : Should contain the SDRAM ECC IRQ in the
+   appropriate format for the IRQ controller.
+
+Example:
+   sdramedac@0 {
+   compatible = "altr,sdram-edac";
+   reg = <0xffc2502C 0x28>;
+   interrupts = <0 39 4>;
+   };
diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
index 310292e..fe9832e 100644
--- a/arch/arm/boot/dts/socfpga.dtsi
+++ b/arch/arm/boot/dts/socfpga.dtsi
@@ -687,6 +687,12 @@
reg = <0xffc25000 0x4>;
};
 
+   sdramedac@0 {
+   compatible = "altr,sdram-edac";
+   reg = <0xffc2502C 0x28>;
+   interrupts = <0 39 4>;
+   };
+
rst: rstmgr@ffd05000 {
compatible = "altr,rst-mgr";
reg = <0xffd05000 0x1000>;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv6 1/3] devicetree: Addition of the Altera SDRAM controller

2014-06-20 Thread tthayer
From: Thor Thayer 

v2: Changes to SoC SDRAM EDAC code.

v3: Implement code suggestions for SDRAM EDAC code.

v4: Remove syscon from SDRAM controller bindings.

v5: No Change, bump version for consistency.

v6: Only map the ctrlcfg register as syscon.

Signed-off-by: Thor Thayer 
---
 .../bindings/arm/altera/socfpga-sdram.txt  |   11 +++
 arch/arm/boot/dts/socfpga.dtsi |5 +
 2 files changed, 16 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt

diff --git a/Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt 
b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt
new file mode 100644
index 000..5027026
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt
@@ -0,0 +1,11 @@
+Altera SOCFPGA SDRAM Controller
+
+Required properties:
+- compatible : "altr,sdr-ctl";
+- reg : Should contain 1 register ranges(address and length)
+
+Example:
+   sdrctl@ffc25000 {
+   compatible = "altr,sdr-ctl";
+   reg = <0xffc25000 0x4>;
+   };
diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
index 4676f25..310292e 100644
--- a/arch/arm/boot/dts/socfpga.dtsi
+++ b/arch/arm/boot/dts/socfpga.dtsi
@@ -682,6 +682,11 @@
clocks = <_sp_clk>;
};
 
+   sdrctl@ffc25000 {
+   compatible = "altr,sdr-ctl", "syscon";
+   reg = <0xffc25000 0x4>;
+   };
+
rst: rstmgr@ffd05000 {
compatible = "altr,rst-mgr";
reg = <0xffd05000 0x1000>;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv6 0/3] Addition of Altera SDRAM EDAC

2014-06-20 Thread tthayer
From: Thor Thayer 

Addition of the Altera SDRAM controller to the EDAC driver.

Thor Thayer (3):
  Addition of the Altera SDRAM controller bindings and device tree
changes to the Altera SoC project.
  Addition of the Altera SDRAM EDAC bindings and device tree
changes to the Altera SoC project.
  edac: altera: Add EDAC support for Altera SoC SDRAM Controller.
This patch adds support for the CycloneV and ArriaV SDRAM
controllers. Correction and reporting of SBEs, Panic on DBEs.

 .../bindings/arm/altera/socfpga-sdram-edac.txt |   15 +
 .../bindings/arm/altera/socfpga-sdram.txt  |   11 +
 arch/arm/boot/dts/socfpga.dtsi |   11 +
 drivers/edac/Kconfig   |9 +
 drivers/edac/Makefile  |2 +
 drivers/edac/altera_edac.c |  448 
 6 files changed, 496 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/arm/altera/socfpga-sdram-edac.txt
 create mode 100644 
Documentation/devicetree/bindings/arm/altera/socfpga-sdram.txt
 create mode 100644 drivers/edac/altera_edac.c

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Add EDAC support for Altera SDRAM Controller

2014-06-20 Thread tthayer
[PATCHv6 1/3] dt: bindings: Addition of the Altera SDRAM controller
[PATCHv6 2/3] dt: bindings: Addition of the Altera SDRAM EDAC 
[PATCHv6 3/3] edac: altera: Add EDAC support for Altera SoC SDRAM
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

2014-06-20 Thread Willy Tarreau
Hi Eric,

On Fri, Jun 20, 2014 at 03:16:07PM -0700, Eric W. Biederman wrote:
> Willy Tarreau  writes:
> 
> > Hi Luis,
> >
> > On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
> >> I was finally able to spend some more time with this and tried (a
> >> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
> >> Although I haven't done any extended testing, I don't see the two
> >> stack traces and the /proc/sys/net/ipv4/ directory seems to be
> >> correctly populated.
> >> 
> >> I'm attaching the patch I've used, based on Tyler's.
> >
> > Would any of you or Tyler please kindly pass me a signed-off-by with
> > a commit message ? That would be great. Alternately I'd do it myself
> > and mention you authored them.
> 
> If my memory serves it is possibe in 2.6.32 to set 
> .ctl_name = CTL_UNNEEDED
> 
> and not need to implement a .strategy routine at all.

Ah that's quite interesting, thanks for the tip!

> Given the fact that most people got the strategy routines
> slightly wrong and that sys_sysctl is effectively unused
> a strategy where you don't implement code that no-one
> will use in a backport I would be preferable.

OK.

> Since you have mentioned this has come up a couple of times if something
> else this will be something to think about for next time.

I'm keeping your e-mail where I manage patches, hoping to recognize
this case next time.

> I am puzzled why .ctl_name was populated in a backport at all.

Oh it's simply because I didn't know it did not have to be there,
and among the few reviewers, I guess that it's not common to know
what version uses what semantics.

Thank you for the exaplanation, it's really helpful. We're not used
to backport sysctl changes but here I got caught a few times and have
found some sysctl.conf with bogus values in field a few times, so it
was really important to backport this one.

Best regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rtc-linux] [PATCH] rtc: add support of nvram for maxim dallas rtc ds1343

2014-06-20 Thread Andrew Morton
On Sat, 24 May 2014 21:34:33 +0530 Raghavendra Ganiga  
wrote:

> This is a patch to add support of nvram for maxim dallas
> rtc ds1343
> 
> ...
>
> --- a/drivers/rtc/rtc-ds1343.c
> +++ b/drivers/rtc/rtc-ds1343.c
> @@ -4,6 +4,7 @@
>   * Real Time Clock
>   *
>   * Author : Raghavendra Chandra Ganiga 
> + *   Ankur Srivastava  : DS1343 Nvram Support
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2 as
> @@ -45,6 +46,9 @@
>  #define DS1343_CONTROL_REG   0x0F
>  #define DS1343_STATUS_REG0x10
>  #define DS1343_TRICKLE_REG   0x11
> +#define DS1343_NVRAM 0x20
> +
> +#define DS1343_NVRAM_LEN 96
>  
>  /* DS1343 Control Registers bits */
>  #define DS1343_EOSC  0x80
> @@ -149,6 +153,64 @@ static ssize_t ds1343_store_glitchfilter(struct device 
> *dev,
>  static DEVICE_ATTR(glitch_filter, S_IRUGO | S_IWUSR, 
> ds1343_show_glitchfilter,
>   ds1343_store_glitchfilter);
>  
> +static ssize_t ds1343_nvram_write(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *attr,
> + char *buf, loff_t off, size_t count)
> +{
> + int ret;
> + unsigned char address;
> + struct device *dev = kobj_to_dev(kobj);
> + struct ds1343_priv *priv = dev_get_drvdata(dev);
> +
> + if (unlikely(!count))
> + return count;
> +
> + if ((count + off) > DS1343_NVRAM_LEN)

I worry about what happens if (count + off) wraps through zero.

> + count = DS1343_NVRAM_LEN - off;

We might end up with an enormous value in `count'?

> + address = DS1343_NVRAM + off;
> +
> + ret = regmap_bulk_write(priv->map, address, buf, count);
> + if (ret < 0)
> + dev_err(>spi->dev, "Error in nvram write %d", ret);
> +
> + return (ret < 0) ? ret : count;
> +}
> +
> +
> +static ssize_t ds1343_nvram_read(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *attr,
> + char *buf, loff_t off, size_t count)
> +{
> + int ret;
> + unsigned char address;
> + struct device *dev = kobj_to_dev(kobj);
> + struct ds1343_priv *priv = dev_get_drvdata(dev);
> +
> + if (unlikely(!count))
> + return count;
> +
> + if ((count + off) > DS1343_NVRAM_LEN)
> + count = DS1343_NVRAM_LEN - off;

Here too.

> + address = DS1343_NVRAM + off;
> +
> + ret = regmap_bulk_read(priv->map, address, buf, count);
> + if (ret < 0)
> + dev_err(>spi->dev, "Error in nvram read %d\n", ret);
> +
> + return (ret < 0) ? ret : count;
> +}
> +
> +

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 0/5] Fix for cond_resched performance regression

2014-06-20 Thread josh
On Fri, Jun 20, 2014 at 03:11:20PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 20, 2014 at 02:24:23PM -0700, j...@joshtriplett.org wrote:
> > On Fri, Jun 20, 2014 at 12:12:36PM -0700, Paul E. McKenney wrote:
> > > o Make cond_resched() a no-op for PREEMPT=y.  This might well turn
> > >   out to be a good thing, but it doesn't help give RCU the quiescent
> > >   states that it needs.
> > 
> > What about doing this, together with letting the fqs logic poke
> > un-quiesced kernel code as needed?  That way, rather than having
> > cond_resched do any work, you have the fqs logic recognize that a
> > particular CPU has gone too long without quiescing, without disturbing
> > that CPU at all if it hasn't gone too long.
> 
> My next stop is to post the previous series, but with a couple of
> exports and one bug fix uncovered by testing thus far, but after
> another round of testing.  Then I am going to take a close look at
> this one:
> 
> o Push the checks further into cond_resched(), so that the
>   fastpath does the same sequence of instructions that the original
>   did.  This might work well, but requires IPIs, which are not so
>   good for latencies on the remote CPU.  It nevertheless might be a
>   decent long-term solution given that if your CPU is spending many
>   jiffies looping in the kernel, you aren't getting good latencies
>   anyway.  It also has the benefit of allowing RCU to take advantage
>   of the implicit quiescent states of all cond_resched() calls,
>   and of eliminating the need for a separate cond_resched_rcu_qs()
>   and for RCU_COND_RESCHED_QS.
> 
> The one you call out is of course interesting as well.  But there are
> a couple of questions:
> 
> 1.Why wasn't cond_resched() a no-op in CONFIG_PREEMPT to start
>   with?  It just seems to obvious a thing to do for it to possibly
>   be an oversight.  (What, me paranoid?)
> 
> 2.When RCU recognizes that a particular CPU has gone too long,
>   exactly what are you suggesting that RCU do about it?  When
>   formulating your answer, please give due consideration to the
>   implications of that CPU being a NO_HZ_FULL CPU.  ;-)

Send it an IPI that either causes it to flag a quiescent state
immediately if currently quiesced or causes it to quiesce at the next
opportunity if not.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: mvebu: Fix missing binding documentation for Armada 38x

2014-06-20 Thread Rob Herring
On Fri, Jun 20, 2014 at 1:52 PM, Jason Cooper  wrote:
> On Thu, Jun 19, 2014 at 06:40:43PM +0200, Gregory CLEMENT wrote:
>> For the Armada 380 and Armada 385 SoCs, the common bindings for those
>> 2 SoCs, was forgotten. This patch add the documentation for the
>> marvell,aramda38x property.
>>
>> Signed-off-by: Gregory CLEMENT 
>> --
>> Hi,
>>
>> This fix should be merged in 3.16. For 3.15 I am not sure as it is not
>> a regression.
>>
>> Changelog:
>> v1->v2
>>
>> - Reformulate to make clear that we will need marvell,armada38x _and_ a
>> SoC specific string. For consistency I duplicated what we have done in
>> armada-370-xp.txt
>>
>>
>> Thanks,
>> Gregory
>>
>>
>>  Documentation/devicetree/bindings/arm/armada-38x.txt | 17 +++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/armada-38x.txt 
>> b/Documentation/devicetree/bindings/arm/armada-38x.txt
>> index 11f2330a6554..fa08760046df 100644
>> --- a/Documentation/devicetree/bindings/arm/armada-38x.txt
>> +++ b/Documentation/devicetree/bindings/arm/armada-38x.txt
>> @@ -6,5 +6,18 @@ following property:
>>
>>  Required root node property:
>>
>> - - compatible: must contain either "marvell,armada380" or
>> -   "marvell,armada385" depending on the variant of the SoC being used.
>> +compatible: must contain "marvell,armada38x"
>
> I agree with Sergei on this one.  We generally avoid wildcards in
> compatible strings.  Is there a use case where specifying one of the
> below wouldn't be sufficient?

Isn't this a case of just documenting what is already in use?

I agree wildcards alone are not good, but along with a specific
compatible is okay. But also there should be some need to have the
common property.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-kernel] [PATCH 2/4] drivers/base: devres.c: Add block copy func. for managed devices

2014-06-20 Thread Ben Hutchings
On Thu, 2014-06-19 at 16:46 +0100, Rob Jones wrote:
[...]
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -793,7 +793,7 @@ EXPORT_SYMBOL_GPL(devm_kmalloc);
>  /**
>   * devm_kstrdup - Allocate resource managed space and
>   *copy an existing string into that.
> - * @dev: Device to allocate memory for
> + * @dev:Device to allocate memory for

You shouldn't be changing this comment...

Ben.

>   * @s: the string to duplicate
>   * @gfp: the GFP mask used in the devm_kmalloc() call when
>   *   allocating memory
[...]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] pwm: add Rockchip SoC PWM support

2014-06-20 Thread Thierry Reding
On Sat, Jun 21, 2014 at 12:00:36AM +0200, Beniamino Galvani wrote:
> On Tue, Jun 17, 2014 at 11:42:58PM +0200, Thierry Reding wrote:
> > On Thu, May 08, 2014 at 01:08:33AM +0200, Beniamino Galvani wrote:
[...]
> > > diff --git a/drivers/pwm/pwm-rockchip.c b/drivers/pwm/pwm-rockchip.c
[...]
> > > +static int rockchip_pwm_config(struct pwm_chip *chip, struct pwm_device 
> > > *pwm,
> > > +int duty_ns, int period_ns)
> > > +{
> > > + struct rockchip_pwm_chip *pc = to_rockchip_pwm_chip(chip);
> > > + unsigned long clk_rate, period, duty;
> > > + u64 div;
> > > + int ret;
> > > +
> > > + clk_rate = clk_get_rate(pc->clk);
> > > +
> > > + /*
> > > +  * Since period and duty cycle registers have a width of 32
> > > +  * bits, every possible input period can be obtained using the
> > > +  * default prescaler value for all practical clock rate values.
> > > +  */
> > > + div = clk_rate;
> > > + div *= period_ns;
> > 
> > Perhaps shorten this to "div = clk_rate * period_ns;"?
> 
> I will change this, adding a cast to avoid the truncation of the
> result to 32 bits: "div = (u64)clk_rate * period_ns;"

Alternatively you could simply make clk_rate a u64 since it's only used
in this context anyway.

Thierry


pgp3utPsR41As.pgp
Description: PGP signature


Re: [Linux-kernel] [PATCH 1/4] drivers/gpio: devres.c: allow gpio array requests for managed devices

2014-06-20 Thread Ben Hutchings
On Thu, 2014-06-19 at 16:46 +0100, Rob Jones wrote:
[...]
> +int devm_gpio_request_array(struct device *dev,
> + const struct gpio *array,
> + size_t num)
> +{
> + int i, err = 0;
> +
> + for (i = 0; i < num; i++, array++) {
> + err = devm_gpio_request_one(dev,
> + array->gpio,
> + array->flags,
> + array->label);
> + if (err) {
> + while (i--)
> + devm_gpio_free(dev, (--array)->gpio);

Missing break here.

> + }
> + }
> +
> + return err;
> +}
> +EXPORT_SYMBOL(devm_gpio_request_array);
[...]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/2] video: ARM CLCD: Add DT support

2014-06-20 Thread Peter Maydell
On 17 June 2014 16:21, Pawel Moll  wrote:
> This patch adds basic DT bindings for the PL11x CLCD cells
> and make their fbdev driver use them.

> +* ARM PrimeCell Color LCD Controller PL110/PL111
> +
> +See also Documentation/devicetree/bindings/arm/primecell.txt
> +
> +Required properties:
> +
> +- compatible: must be one of:
> +   "arm,pl110", "arm,primecell"
> +   "arm,pl111", "arm,primecell"
> +
> +- reg: base address and size of the control registers block
> +
> +- interrupt-names: either the single entry "combined" representing a
> +   combined interrupt output (CLCDINTR), or the four entries
> +   "mbe", "vcomp", "lnbu", "fuf" representing the individual
> +   CLCDMBEINTR, CLCDVCOMPINTR, CLCDLNBUINTR, CLCDFUFINTR interrupts
> +
> +- interrupts: contains an interrupt specifier for each entry in
> +   interrupt-names
> +
> +- clocks-names: should contain "clcdclk" and "apb_pclk"
> +
> +- clocks: contains phandle and clock specifier pairs for the entries
> +   in the clock-names property. See
> +   Documentation/devicetree/binding/clock/clock-bindings.txt
> +
> +Optional properties:
> +
> +- arm,pl11x,framebuffer-base: a pair of two 32-bit values, address and size,
> +   defining the framebuffer that must be used; if not present, the
> +   framebuffer may be located anywhere in the memory
> +
> +- max-memory-bandwidth: maximum bandwidth in bytes per second that the
> +   cell's memory interface can handle
> +
> +Required sub-nodes:
> +
> +- port: describes LCD panel signals, following the common binding
> +   for video transmitter interfaces; see
> +   Documentation/devicetree/bindings/media/video-interfaces.txt;
> +   when it is a TFT panel, the port's endpoint must define the
> +   following property:
> +
> +   - arm,pl11x,tft-r0g0b0-pads: an array of three 32-bit values,
> +   defining the way CLD pads are wired up; this implicitly
> +   defines available color modes, for example:
> +   - PL111 TFT 4:4:4 panel:
> +   arm,pl11x,tft-r0g0b0-pads = <4 15 20>;
> +   - PL110 TFT (1:)5:5:5 panel:
> +   arm,pl11x,tft-r0g0b0-pads = <1 7 13>;
> +   - PL111 TFT (1:)5:5:5 panel:
> +   arm,pl11x,tft-r0g0b0-pads = <3 11 19>;
> +   - PL111 TFT 5:6:5 panel:
> +   arm,pl11x,tft-r0g0b0-pads = <3 10 19>;
> +   - PL110 and PL111 TFT 8:8:8 panel:
> +   arm,pl11x,tft-r0g0b0-pads = <0 8 16>;
> +   - PL110 and PL111 TFT 8:8:8 panel, R & B components swapped:
> +   arm,pl11x,tft-r0g0b0-pads = <16 8 0>;

How does this work for boards like the versatilepb which have a
mux between a PL110 and the TFT, allowing it to efffectively
rewire the pads at runtime under control of the SYS_CLCD
sysreg (to give a wider range of colour modes than the
PL110 supports natively)?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] pwm: st: Add new driver for ST's PWM IP

2014-06-20 Thread Thierry Reding
On Thu, Jun 19, 2014 at 07:57:14PM +0530, Ajit Pal wrote:
> On Thursday 19 June 2014 02:14 PM, Lee Jones wrote:
> >On Thu, 19 Jun 2014, Thierry Reding wrote:
> >>On Wed, Jun 18, 2014 at 03:52:51PM +0100, Lee Jones wrote:
[...]
> >>>+   cdata->max_prescale + 1, sizeof(unsigned long),
> >>>+   st_pwm_cmp_periods);
> >>>+   if (!found) {
> >>>+   dev_err(dev, "failed to find matching period\n");
> >>>+   return -EINVAL;
> >>>+   }
> >>>+
> >>>+   prescale = found - >pwm_periods[0];
> >>
> >>This is somewhat unconventional. None of the other drivers precompute
> >>possible periods and I'm not convinced that it's an advantage. Setting
> >>the period (and configuring the PWM in general) is a fairly uncommon
> >>operation.
> >
> >Another one for Ajit I feel.
> 
> For ST PWM IP, the PWM period is fixed to 256 local clock pulses.There is no
> register interface to select PWM periods.To change the period we have to
> change the prescaler.
> We precompute the possible periods, so as to avoid the calculations
> everytime the .config function is called. Based upon a matching period we
> then select the prescaler.
> Sorry but why do you think precomputing is not helpful ?

Mostly I dislike it here because it sticks out as nobody else is doing
it. Secondly I'm not convinced that it gives you much of a performance
gain since the computations aren't that involved and typically the
period isn't changed all that often.

Also computing the value directly in .config() makes the code much
easier to follow.

> >>>+static int st_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm)
> >>>+{
> >>>+   struct st_pwm_chip *pc = to_st_pwmchip(chip);
> >>>+   struct device *dev = pc->dev;
> >>>+   int ret;
> >>>+
> >>>+   ret = clk_enable(pc->clk);
> >>>+   if (ret)
> >>>+   return ret;
> >>>+
> >>>+   ret = regmap_field_write(pc->pwm_en, 1);
> >>>+   if (ret)
> >>>+   dev_err(dev, "%s,pwm_en write failed\n", __func__);
> 
> >>
> >>This error message is somewhat cryptic, perhaps:
> >>
> >>   "failed to enable PWM"
> >
> >Agreed.  I also can't believe I missed that nasty __func__ too.
> >
> >>? Also what implications does this have on controllers with multiple
> >>channels?
> >
> >I believe this enables both channels, but I'm sure Ajit will correct
> >me if I'm wrong.
> 
> Yes it enables all channels.Unfortunately we do not have the facility to
> enable/disable individual channels on the ST PWM IP.

That's bad. If you can't control them separately then there's no way you
can guarantee the semantics of the PWM framework.

> >>>+   dev_dbg(dev, "pwm counter :%u\n", val);
> >>>+
> >>>+   clk_disable(pc->clk);
> >>>+}
> >>>+
> >>>+static const struct pwm_ops st_pwm_ops = {
> >>>+   .config = st_pwm_config,
> >>>+   .enable = st_pwm_enable,
> >>>+   .disable = st_pwm_disable,
> >>>+   .owner = THIS_MODULE,
> >>>+};
> >>>+
> >>>+static int st_pwm_probe_dt(struct st_pwm_chip *pc)
> >>>+{
> >>>+   struct device *dev = pc->dev;
> >>>+   const struct reg_field *reg_fields;
> >>>+   struct device_node *np = dev->of_node;
> >>>+   struct st_pwm_compat_data *cdata = pc->cdata;
> >>>+   u32 num_chan;
> >>>+
> >>>+   of_property_read_u32(np, "st,pwm-num-chan", _chan);
> >>>+   if (num_chan)
> >>>+   cdata->num_chan = num_chan;
> >>
> >>I don't like this very much. What influences the number of channels? Is
> >>it that specific SoC revisions have one and others have two?
> >
> >Ajit?
> >
> Depends on the board type on which the SoC is used.

I don't understand. How can the board influence the number of PWM
channels that the SoC supports? It does make sense for a board to define
how many of them are actually *used*, but that's nothing that DT should
contain nor that the driver should care about. The driver (and DT for
that matter) should expose the hardware block's full capabilities. The
use-case is what should determine what's used and what not.

Thierry


pgpu97gq4OBC5.pgp
Description: PGP signature


[PATCH] sched: Fix potential near-infinite distribute_cfs_runtime loop

2014-06-20 Thread Ben Segall
distribute_cfs_runtime intentionally only hands out enough runtime to
bring each cfs_rq to 1 ns of runtime, expecting the cfs_rqs to then take
the runtime they need only once they actually get to run. However, if
they get to run sufficiently quickly, the period timer is still in
distribute_cfs_runtime and no runtime is available, causing them to
throttle. Then distribute has to handle them again, and this can go on
until distribute has handed out all of the runtime 1ns at a time, which
takes far too long.

Instead allow access to the same runtime that distribute is handing out,
accepting that corner cases with very low quota may be able to spend the
entire cfs_b->runtime during distribute_cfs_runtime, meaning that the
runtime directly handed out by distribute_cfs_runtime was over quota. In
addition, if a cfs_rq does manage to throttle like this, make sure the
existing distribute_cfs_runtime no longer loops over it again.

Signed-off-by: Ben Segall 
---
 kernel/sched/fair.c |   41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1f9c457..ef5eac7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3361,7 +3361,11 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
cfs_rq->throttled = 1;
cfs_rq->throttled_clock = rq_clock(rq);
raw_spin_lock(_b->lock);
-   list_add_tail_rcu(_rq->throttled_list, _b->throttled_cfs_rq);
+   /*
+* Add to the _head_ of the list, so that an already-started
+* distribute_cfs_runtime will not see us
+*/
+   list_add_rcu(_rq->throttled_list, _b->throttled_cfs_rq);
if (!cfs_b->timer_active)
__start_cfs_bandwidth(cfs_b, false);
raw_spin_unlock(_b->lock);
@@ -3418,7 +3422,8 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth 
*cfs_b,
u64 remaining, u64 expires)
 {
struct cfs_rq *cfs_rq;
-   u64 runtime = remaining;
+   u64 runtime;
+   u64 starting_runtime = remaining;
 
rcu_read_lock();
list_for_each_entry_rcu(cfs_rq, _b->throttled_cfs_rq,
@@ -3449,7 +3454,7 @@ next:
}
rcu_read_unlock();
 
-   return remaining;
+   return starting_runtime - remaining;
 }
 
 /*
@@ -3495,22 +3500,17 @@ static int do_sched_cfs_period_timer(struct 
cfs_bandwidth *cfs_b, int overrun)
/* account preceding periods in which throttling occurred */
cfs_b->nr_throttled += overrun;
 
-   /*
-* There are throttled entities so we must first use the new bandwidth
-* to unthrottle them before making it generally available.  This
-* ensures that all existing debts will be paid before a new cfs_rq is
-* allowed to run.
-*/
-   runtime = cfs_b->runtime;
runtime_expires = cfs_b->runtime_expires;
-   cfs_b->runtime = 0;
 
/*
-* This check is repeated as we are holding onto the new bandwidth
-* while we unthrottle.  This can potentially race with an unthrottled
-* group trying to acquire new bandwidth from the global pool.
+* This check is repeated as we are holding onto the new bandwidth while
+* we unthrottle. This can potentially race with an unthrottled group
+* trying to acquire new bandwidth from the global pool. This can result
+* in us over-using our runtime if it is all used during this loop, but
+* only by limited amounts in that extreme case.
 */
-   while (throttled && runtime > 0) {
+   while (throttled && cfs_b->runtime > 0) {
+   runtime = cfs_b->runtime;
raw_spin_unlock(_b->lock);
/* we can't nest cfs_b->lock while distributing bandwidth */
runtime = distribute_cfs_runtime(cfs_b, runtime,
@@ -3518,10 +3518,10 @@ static int do_sched_cfs_period_timer(struct 
cfs_bandwidth *cfs_b, int overrun)
raw_spin_lock(_b->lock);
 
throttled = !list_empty(_b->throttled_cfs_rq);
+
+   cfs_b->runtime -= min(runtime, cfs_b->runtime);
}
 
-   /* return (any) remaining runtime */
-   cfs_b->runtime = runtime;
/*
 * While we are ensured activity in the period following an
 * unthrottle, this also covers the case in which the new bandwidth is
@@ -3632,10 +3632,9 @@ static void do_sched_cfs_slack_timer(struct 
cfs_bandwidth *cfs_b)
return;
}
 
-   if (cfs_b->quota != RUNTIME_INF && cfs_b->runtime > slice) {
+   if (cfs_b->quota != RUNTIME_INF && cfs_b->runtime > slice)
runtime = cfs_b->runtime;
-   cfs_b->runtime = 0;
-   }
+
expires = cfs_b->runtime_expires;
raw_spin_unlock(_b->lock);
 
@@ -3646,7 +3645,7 @@ static void do_sched_cfs_slack_timer(struct cfs_bandwidth 
*cfs_b)
 
raw_spin_lock(_b->lock);
if (expires == cfs_b->runtime_expires)
-

Re: [ 059/143] sysctl net: Keep tcp_syn_retries inside the boundary

2014-06-20 Thread Eric W. Biederman
Willy Tarreau  writes:

> Hi Luis,
>
> On Thu, Jun 12, 2014 at 01:55:53PM +0100, Luis Henriques wrote:
>> I was finally able to spend some more time with this and tried (a
>> modified) Tyler's patch on top of 2.6.32.62, and it seems to work.
>> Although I haven't done any extended testing, I don't see the two
>> stack traces and the /proc/sys/net/ipv4/ directory seems to be
>> correctly populated.
>> 
>> I'm attaching the patch I've used, based on Tyler's.
>
> Would any of you or Tyler please kindly pass me a signed-off-by with
> a commit message ? That would be great. Alternately I'd do it myself
> and mention you authored them.

If my memory serves it is possibe in 2.6.32 to set 
.ctl_name = CTL_UNNEEDED

and not need to implement a .strategy routine at all.

Given the fact that most people got the strategy routines
slightly wrong and that sys_sysctl is effectively unused
a strategy where you don't implement code that no-one
will use in a backport I would be preferable.

Since you have mentioned this has come up a couple of times if something
else this will be something to think about for next time.

I am puzzled why .ctl_name was populated in a backport at all.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] pwm: st: Add new driver for ST's PWM IP

2014-06-20 Thread Thierry Reding
On Thu, Jun 19, 2014 at 09:44:04AM +0100, Lee Jones wrote:
> I'll comment on some of the more fluffy topics, I'll let Ajit reply to
> the more technical details of the patch.
> 
> On Thu, 19 Jun 2014, Thierry Reding wrote:
> > On Wed, Jun 18, 2014 at 03:52:51PM +0100, Lee Jones wrote:
> > > This driver supports all current STi platforms' PWM IPs.
> > > 
> > > Signed-off-by: Lee Jones 
> > > ---
> > >  drivers/pwm/Kconfig  |   9 ++
> > >  drivers/pwm/Makefile |   1 +
> > >  drivers/pwm/pwm-st.c | 378 
> > > +++
> > >  3 files changed, 388 insertions(+)
> > >  create mode 100644 drivers/pwm/pwm-st.c
> > > 
> > > diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> > > index 4ad7b89..98a7bbc 100644
> > > --- a/drivers/pwm/Kconfig
> > > +++ b/drivers/pwm/Kconfig
> > > @@ -292,4 +292,13 @@ config PWM_VT8500
> > > To compile this driver as a module, choose M here: the module
> > > will be called pwm-vt8500.
> > >  
> > > +config PWM_ST
> > 
> > PWM_ST is awfully generic, perhaps PWM_STI would be a better choice?
> > Even that's very generic. Maybe PWM_STI_H4XX? There's nothing wrong with
> > supporting STiH{5,6,7,...}xx SoCs with such a driver. I'm just trying to
> > think ahead what will happen if at some point a new SoC family is
> > released that requires a different driver.
> 
> I'm inclined to agree with you, but as it stands, this driver supports
> all ST h/w, so it's correct for it to be generic.  If some new IP
> comes into fuition, at worst we'll have to change the name of the
> driver.  I'm happy to put myself on the line for that if the time
> comes.

Renaming a driver isn't a trivial matter. People may be using the name
in blacklists or scripts and renaming will likely annoy them. Like I
said, there's nothing wrong with the driver name being less generic, we
have other ways to identify what hardware it will run on.

> > > diff --git a/drivers/pwm/pwm-st.c b/drivers/pwm/pwm-st.c
[...]
> > > +#define MAX_PWM_CNT_DEFAULT  255
> > > +#define MAX_PRESCALE_DEFAULT 0xff
> > > +#define NUM_CHAN_DEFAULT 1
> > 
> > These are only used in one place and their meaning is fairly obvious, so
> > I'd just drop them.
> 
> I _always_ prefer defines over magic numbers, but as you wish - will fix.

In general I agree, but there are cases where in my opinion the defines
obfuscate rather than help. This is one of those. These aren't really
magic numbers, since they are used in a context where their meaning is
crystal clear.

> > > + PWM_EN,
> > > + PWM_INT_EN,
> > > + /* keep last */
> > > + MAX_REGFIELDS
> > > +};
> > > +
> > > +struct st_pwm_chip {
> > > + struct device *dev;
> > > + struct clk *clk;
> > > + unsigned long clk_rate;
> > > + struct regmap *regmap;
> > > + struct st_pwm_compat_data *cdata;
> > 
> > Doesn't this require a predeclaration of struct st_pwm_compat_data? Or
> > maybe just move struct st_pwm_compat_data before this.
> 
> You're right, will fix.
> 
> I think I would have expected at least a compiler warning about that?

Me too. Perhaps one of the includes has a forward declaration? I'd hope
not.

> > > +};
> > > +
> > > +struct st_pwm_compat_data {
> > > + const struct reg_field *reg_fields;
> > > + int num_chan;
> > > + int max_pwm_cnt;
> > > + int max_prescale;
> > 
> > Can't these three be unsigned?
> 
> I see no reason why not.  They can also be signed. :)

I prefer if variables use the strictest type possible.

> > > +static void st_pwm_calc_periods(struct st_pwm_chip *pc)
> > > +{
> > > + struct st_pwm_compat_data *cdata = pc->cdata;
> > > + struct device *dev = pc->dev;
> > > + unsigned long val;
> > > + int i;
> > 
> > unsigned?
> 
> Why?
> 
> It's much more common this way:
> 
> $ git grep $'\t'"int i;" | wc -l
> 17018
> $ git grep $'\t'"unsigned int i;" | wc -l
> 2033

That just means that not everybody is as pedantic as I am. The reason
why it should be unsigned int is that it's used in a loop and compared
to a value which should also be unsigned (cdata->max_prescale). There
just isn't a reasonable scenario where they would need to be negative.

> > > + * 16 possible period values are supported (for a particular clock rate).
> > > + * The requested period will be applied only if it matches one of these
> > > + * 16 values.
> > > + */
> > > +static int st_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> > > +  int duty_ns, int period_ns)
> > > +{
> > > + struct st_pwm_chip *pc = to_st_pwmchip(chip);
> > > + struct device *dev = pc->dev;
> > > + struct st_pwm_compat_data *cdata = pc->cdata;
> > > + unsigned int prescale, pwmvalx;
> > > + unsigned long *found;
> > > + int ret;
> > > +
> > > + /*
> > > +  * Search for matching period value. The corresponding index is our
> > > +  * prescale value
> > > +  */
> > > + found = bsearch(_ns, >pwm_periods[0],
> > 
> > Technically doesn't period_ns need to be converted to an unsigned long
> > here? Otherwise this won't be compatible with 64-bit 

  1   2   3   4   5   6   7   8   9   10   >