Re: [PATCH v2] security: device_cgroup: fix RCU lockdep splat

2015-09-03 Thread Paul E. McKenney
On Thu, Sep 03, 2015 at 08:48:39AM -0500, Serge E. Hallyn wrote:
> On Wed, Sep 02, 2015 at 05:14:33PM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 02, 2015 at 12:24:50PM -0400, Tejun Heo wrote:
> > > cc'ing Paul.
> > > 
> > > On Wed, Sep 02, 2015 at 08:12:28AM -0500, Felipe Balbi wrote:
> > > > while booting AM437x device, the following splat
> > > > triggered:
> > > > 
> > > > [   12.005238] ===
> > > > [   12.009749] [ INFO: suspicious RCU usage. ]
> > > > [   12.014116] 4.2.0-next-20150831 #1154 Not tainted
> > > > [   12.019050] ---
> > > > [   12.023408] security/device_cgroup.c:405 device_cgroup:verify_new_ex 
> > > > called without proper synchronization!
> > > ...
> > > > [   12.128326] [] (verify_new_ex) from [] 
> > > > (devcgroup_access_write+0x374/0x658)
> > > > [   12.137426] [] (devcgroup_access_write) from [] 
> > > > (cgroup_file_write+0x28/0x1bc)
> > > > [   12.146796] [] (cgroup_file_write) from [] 
> > > > (kernfs_fop_write+0xc0/0x1b8)
> > > > [   12.155620] [] (kernfs_fop_write) from [] 
> > > > (__vfs_write+0x1c/0xd8)
> > > > [   12.163783] [] (__vfs_write) from [] 
> > > > (vfs_write+0x90/0x16c)
> > > > [   12.171426] [] (vfs_write) from [] 
> > > > (SyS_write+0x44/0x9c)
> > > > [   12.178806] [] (SyS_write) from [] 
> > > > (ret_fast_syscall+0x0/0x1c)
> > > 
> > > This shouldn't be happening because devcgroup_access_write() always
> > > grabs devcgroup_mutex.  Looking at the log, the culprit seems to be
> > > f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to
> > > RCU_LOCKDEP_WARN()").  It missed the bang for the second test while
> > > inverting it, so adding rcu_read_lock() isn't the right fix here.
> > > 
> > > Paul, can you please fix it?
> > 
> > Gah!  Please see below.
> > 
> > Thanx, Paul
> > 
> > 
> > 
> > security/device_cgroup: Fix RCU_LOCKDEP_WARN() condition
> > 
> > f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()")
> > introduced a bug by incorrectly inverting the condition when moving from
> > rcu_lockdep_assert() to RCU_LOCKDEP_WARN().  This commit therefore fixes
> > the inversion.
> > 
> > Reported-by: Felipe Balbi <ba...@ti.com>
> > Reported-by: Tejun Heo <t...@kernel.org>
> > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> > Cc: Serge Hallyn <se...@hallyn.com>
> 
> Oh, makes sense :)  (didn't see the original patch when it came by, sorry)

I should have CCed you, apologies for failing to do so.

> Acked-by: Serge Hallyn <serge.hal...@canonical.com>

Added, thank you!

Thanx, Paul

> > diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> > index 73455089feef..03c1652c9a1f 100644
> > --- a/security/device_cgroup.c
> > +++ b/security/device_cgroup.c
> > @@ -401,7 +401,7 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup,
> > bool match = false;
> >  
> > RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > -lockdep_is_held(_mutex),
> > +!lockdep_is_held(_mutex),
> >  "device_cgroup:verify_new_ex called without proper 
> > synchronization");
> >  
> > if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] security: device_cgroup: fix RCU lockdep splat

2015-09-03 Thread Paul E. McKenney
On Thu, Sep 03, 2015 at 09:56:05AM -0400, Josh Boyer wrote:
> On Wed, Sep 2, 2015 at 8:14 PM, Paul E. McKenney
> <paul...@linux.vnet.ibm.com> wrote:
> > On Wed, Sep 02, 2015 at 12:24:50PM -0400, Tejun Heo wrote:
> >> cc'ing Paul.
> >>
> >> On Wed, Sep 02, 2015 at 08:12:28AM -0500, Felipe Balbi wrote:
> >> > while booting AM437x device, the following splat
> >> > triggered:
> >> >
> >> > [   12.005238] ===
> >> > [   12.009749] [ INFO: suspicious RCU usage. ]
> >> > [   12.014116] 4.2.0-next-20150831 #1154 Not tainted
> >> > [   12.019050] ---
> >> > [   12.023408] security/device_cgroup.c:405 device_cgroup:verify_new_ex 
> >> > called without proper synchronization!
> >> ...
> >> > [   12.128326] [] (verify_new_ex) from [] 
> >> > (devcgroup_access_write+0x374/0x658)
> >> > [   12.137426] [] (devcgroup_access_write) from [] 
> >> > (cgroup_file_write+0x28/0x1bc)
> >> > [   12.146796] [] (cgroup_file_write) from [] 
> >> > (kernfs_fop_write+0xc0/0x1b8)
> >> > [   12.155620] [] (kernfs_fop_write) from [] 
> >> > (__vfs_write+0x1c/0xd8)
> >> > [   12.163783] [] (__vfs_write) from [] 
> >> > (vfs_write+0x90/0x16c)
> >> > [   12.171426] [] (vfs_write) from [] 
> >> > (SyS_write+0x44/0x9c)
> >> > [   12.178806] [] (SyS_write) from [] 
> >> > (ret_fast_syscall+0x0/0x1c)
> >>
> >> This shouldn't be happening because devcgroup_access_write() always
> >> grabs devcgroup_mutex.  Looking at the log, the culprit seems to be
> >> f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to
> >> RCU_LOCKDEP_WARN()").  It missed the bang for the second test while
> >> inverting it, so adding rcu_read_lock() isn't the right fix here.
> >>
> >> Paul, can you please fix it?
> >
> > Gah!  Please see below.
> >
> > Thanx, Paul
> >
> > ----
> >
> > security/device_cgroup: Fix RCU_LOCKDEP_WARN() condition
> >
> > f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()")
> > introduced a bug by incorrectly inverting the condition when moving from
> > rcu_lockdep_assert() to RCU_LOCKDEP_WARN().  This commit therefore fixes
> > the inversion.
> >
> > Reported-by: Felipe Balbi <ba...@ti.com>
> > Reported-by: Tejun Heo <t...@kernel.org>
> > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> > Cc: Serge Hallyn <se...@hallyn.com>
> 
> Just tested this patch without Felipe's previous version on all my
> machines.  The splat is indeed gone.
> 
> Tested-by: Josh Boyer <jwbo...@fedoraproject.org>

Thank you, added!

Thanx, Paul

> josh
> 
> >
> > diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> > index 73455089feef..03c1652c9a1f 100644
> > --- a/security/device_cgroup.c
> > +++ b/security/device_cgroup.c
> > @@ -401,7 +401,7 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup,
> > bool match = false;
> >
> > RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> > -lockdep_is_held(_mutex),
> > +!lockdep_is_held(_mutex),
> >  "device_cgroup:verify_new_ex called without proper 
> > synchronization");
> >
> > if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] security: device_cgroup: fix RCU lockdep splat

2015-09-02 Thread Paul E. McKenney
On Wed, Sep 02, 2015 at 12:24:50PM -0400, Tejun Heo wrote:
> cc'ing Paul.
> 
> On Wed, Sep 02, 2015 at 08:12:28AM -0500, Felipe Balbi wrote:
> > while booting AM437x device, the following splat
> > triggered:
> > 
> > [   12.005238] ===
> > [   12.009749] [ INFO: suspicious RCU usage. ]
> > [   12.014116] 4.2.0-next-20150831 #1154 Not tainted
> > [   12.019050] ---
> > [   12.023408] security/device_cgroup.c:405 device_cgroup:verify_new_ex 
> > called without proper synchronization!
> ...
> > [   12.128326] [] (verify_new_ex) from [] 
> > (devcgroup_access_write+0x374/0x658)
> > [   12.137426] [] (devcgroup_access_write) from [] 
> > (cgroup_file_write+0x28/0x1bc)
> > [   12.146796] [] (cgroup_file_write) from [] 
> > (kernfs_fop_write+0xc0/0x1b8)
> > [   12.155620] [] (kernfs_fop_write) from [] 
> > (__vfs_write+0x1c/0xd8)
> > [   12.163783] [] (__vfs_write) from [] 
> > (vfs_write+0x90/0x16c)
> > [   12.171426] [] (vfs_write) from [] 
> > (SyS_write+0x44/0x9c)
> > [   12.178806] [] (SyS_write) from [] 
> > (ret_fast_syscall+0x0/0x1c)
> 
> This shouldn't be happening because devcgroup_access_write() always
> grabs devcgroup_mutex.  Looking at the log, the culprit seems to be
> f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to
> RCU_LOCKDEP_WARN()").  It missed the bang for the second test while
> inverting it, so adding rcu_read_lock() isn't the right fix here.
> 
> Paul, can you please fix it?

Gah!  Please see below.

Thanx, Paul



security/device_cgroup: Fix RCU_LOCKDEP_WARN() condition

f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()")
introduced a bug by incorrectly inverting the condition when moving from
rcu_lockdep_assert() to RCU_LOCKDEP_WARN().  This commit therefore fixes
the inversion.

Reported-by: Felipe Balbi <ba...@ti.com>
Reported-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
Cc: Serge Hallyn <se...@hallyn.com>

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 73455089feef..03c1652c9a1f 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -401,7 +401,7 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup,
bool match = false;
 
RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
-lockdep_is_held(_mutex),
+!lockdep_is_held(_mutex),
 "device_cgroup:verify_new_ex called without proper 
synchronization");
 
if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap: panda rcu stall on 3.10.1

2013-09-03 Thread Paul E. McKenney
On Fri, Aug 30, 2013 at 08:27:32AM +0200, Christian Hoffmann wrote:
 
  This also went to: linaro-ker...@lists.linaro.org
 
 Hi,
 
 current linaro kernel on 13.07 (3.10.1.0-1-linaro-omap) shows stack
 below. System is generally quite unstable. Light usage
 (postfix+dovecot...)
 
 Full dmesg here:
 https://dl.dropboxusercontent.com/u/21820416/rcu.stall.panda

Looks to me like the CPU stall is cascaded problem.  Prior to this in
dmesg, you have several terminated with status N messages, where N!=0.

I believe that fixing the problems leading to these messages would be
a good first step.

Thanx, Paul

 Rgds,
 Chris
 
 [ 4538.880432] INFO: rcu_sched detected stalls on CPUs/tasks: { 1}
 (detected by 0, t=2103 jiffies, g=1649, c=1648, q=18)
 [ 4538.891723] Backtrace for cpu 0 (current):
 [ 4538.891723] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
 3.10.1.0-1-linaro-omap #1ubuntu1~ci+130723003933-Ubuntu
 [ 4538.891754] [c001175d] (unwind_backtrace+0x1/0x9c) from
 [c000f625] (show_stack+0x11/0x14)
 [ 4538.891815] [c000f625] (show_stack+0x11/0x14) from [c00106d5]
 (smp_send_all_cpu_backtrace+0x49/0x98)
 [ 4538.891815] [c00106d5] (smp_send_all_cpu_backtrace+0x49/0x98) from
 [c007c755] (rcu_check_callbacks+0x439/0x4ac)
 [ 4538.891815] [c007c755] (rcu_check_callbacks+0x439/0x4ac) from
 [c0036599] (update_process_times+0x2d/0x4c)
 [ 4538.891876] [c0036599] (update_process_times+0x2d/0x4c) from
 [c005fcfd] (tick_sched_handle+0x35/0x40)
 [ 4538.891906] [c005fcfd] (tick_sched_handle+0x35/0x40) from
 [c005feed] (tick_sched_timer+0x31/0x54)
 [ 4538.891937] [c005feed] (tick_sched_timer+0x31/0x54) from
 [c004532b] (__run_hrtimer+0x5b/0x138)
 [ 4538.891937] [c004532b] (__run_hrtimer+0x5b/0x138) from [c0045bcb]
 (hrtimer_interrupt+0xdb/0x22c)
 [ 4538.891967] [c0045bcb] (hrtimer_interrupt+0xdb/0x22c) from
 [c0010e13] (twd_handler+0x1b/0x28)
 [ 4538.891998] [c0010e13] (twd_handler+0x1b/0x28) from [c00783b9]
 (handle_percpu_devid_irq+0x4d/0xac)
 [ 4538.891998] [c00783b9] (handle_percpu_devid_irq+0x4d/0xac) from
 [c0075ead] (generic_handle_irq+0x19/0x24)
 [ 4538.892028] [c0075ead] (generic_handle_irq+0x19/0x24) from
 [c000d16d] (handle_IRQ+0x29/0x68)
 [ 4538.892059] [c000d16d] (handle_IRQ+0x29/0x68) from [c00084db]
 (gic_handle_irq+0x27/0x50)
 [ 4538.892059] [c00084db] (gic_handle_irq+0x27/0x50) from [c045315b]
 (__irq_svc+0x3b/0x5c)
 [ 4538.892059] Exception stack(0xc06e3f68 to 0xc06e3fb0)
 [ 4538.892059] 3f60:     
  c06e2000 c075f130
 [ 4538.892120] 3f80: c06e2000 c075ea93 c075f130 c06ea594 c045c2e0
 c06e2000 c0fe5908 c06e3fb0
 [ 4538.892120] 3fa0: c007bab1 c0058b48 400f0133 
 [ 4538.892120] [c045315b] (__irq_svc+0x3b/0x5c) from [c0058b48]
 (cpu_startup_entry+0xbc/0x164)
 [ 4538.892120] [c0058b48] (cpu_startup_entry+0xbc/0x164) from
 [c06735f9] (start_kernel+0x251/0x2ac)
 [ 4538.892120]
 [ 4538.892120] sending IPI to all other CPUs:
 [ 4538.892211] IPI backtrace for cpu 1
 [ 4538.892211]
 [ 4538.892242] CPU: 1 PID: 2064 Comm: clamd Not tainted
 3.10.1.0-1-linaro-omap #1ubuntu1~ci+130723003933-Ubuntu
 [ 4538.892242] task: edb09500 ti: ec91e000 task.ti: ec91e000
 [ 4538.892272] PC is at 0xb6e2f0d4
 [ 4538.892272] LR is at 0xb6eeec01
 [ 4538.892272] pc : [b6e2f0d4]lr : [b6eeec01]psr: 000f0130
 [ 4538.892272] sp : be8b472c  ip : 62323337  fp : 684a
 [ 4538.892303] r10: 0084d750  r9 : 0200  r8 : e86d6a2f
 [ 4538.892303] r7 : 0a0a0a0a  r6 : 69393e32  r5 : 0080bf24  r4 : 0080bf28
 [ 4538.892333] r3 : 81010100  r2 : 6438  r1 : 000a  r0 : 0080bf24
 [ 4538.892333] Flags: nzcv  IRQs on  FIQs on  Mode USER_32  ISA Thumb
 Segment user
 [ 4538.892333] Control: 50c5387d  Table: acbd404a  DAC: 
 [ 4538.892364]
 [ 4538.892364] R8: 0xe86d69af:
 [ 4538.892364] 69ac  6a6f7254 412e6e61 746e6567 3939332d 00303934
 b1aa3270 2506 47a83ca7
 [ 4538.892395] 69cc  e4ebc453 91ad979e 23363f8b fd49d9d0 773b48ba
 ad58f7d3 04c351eb 96e4414f
 [ 4538.892395] 69ec  7e1a3ac7 f39072a5 e362cedb b119b3e0 b595375e
 b2137762 b19f99a8 b10dc638
 [ 4538.892395] 6a0c  1806 2e6e6957 6a6f7254 412e6e61 746e6567
 3939332d 00303534 b23ade28
 [ 4538.892395] 6a2c  1806 2e6e6957 6a6f7254 462e6e61 61656b61
 37342d76 00373131 ac58376e
 [ 4538.892395] 6a4c  72542e6e 6e616a6f 3434382e 35313336 3e55c600
 412e6eb2 72617764 61472e65
 [ 4538.892517] 6a6c  6176656d 2d65636e 34393132 55b20037 1604b173
 2e6e6957 6a6f7254 462e6e61
 [ 4538.892547] 6a8c  61656b61 37342d76 00343131 b652e2c0 72542e6e
 6e616a6f 7075532e 756a7265
 [ 4538.892547] 6aac  322d6e61 fb003731 15b23a2b 2e6e6957 6a6f7254
 412e6e61 746e6567 3939332d
 [ 4538.892547] CPU: 1 PID: 2064 Comm: clamd Not tainted
 3.10.1.0-1-linaro-omap #1ubuntu1~ci+130723003933-Ubuntu
 [ 4538.892547] [c001175d] (unwind_backtrace+0x1/0x9c) from
 [c000f625] (show_stack+0x11/0x14)
 [ 4538.892639] [c000f625] (show_stack+0x11/0x14) from [c0010819]
 (handle_IPI+0xd5/0x158)
 [ 4538.892669] [c0010819] 

Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-10-01 Thread Paul E. McKenney
On Mon, Oct 01, 2012 at 10:55:11AM +0200, Linus Walleij wrote:
 On Sat, Sep 22, 2012 at 11:59 PM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
 
  rcu: Fix day-one dyntick-idle stall-warning bug
 
 As mentioned in another thread this solves the same problem for ux500.
 Reported/Tested-by: Linus Walleij linus.wall...@linaro.org
 
 But now it appears that this commit didn't make it into v3.6 so
 it definately needs to be tagged with Cc: sta...@kernel.org
 before it gets merged since the stall warnings are kinda scary.

Ingo submitting this to Linus Torvalds earlier today, so we should be
able to send to stable shortly.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-24 Thread Paul E. McKenney
On Mon, Sep 24, 2012 at 03:11:34PM +0530, Shilimkar, Santosh wrote:
 On Sun, Sep 23, 2012 at 3:29 AM, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
  On Sat, Sep 22, 2012 at 01:10:43PM -0700, Paul E. McKenney wrote:
  On Sat, Sep 22, 2012 at 06:42:08PM +, Paul Walmsley wrote:
   On Fri, 21 Sep 2012, Paul E. McKenney wrote:
 
 [...]
 
 
  And here is a patch.  I am still having trouble reproducing the problem,
  but figured that I should avoid serializing things.
 
  Thanx, Paul
 
  
 
   b/kernel/rcutree.c |4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)
 
  rcu: Fix day-one dyntick-idle stall-warning bug
 
  Each grace period is supposed to have at least one callback waiting
  for that grace period to complete.  However, if CONFIG_NO_HZ=n, an
  extra callback-free grace period is no big problem -- it will chew up
  a tiny bit of CPU time, but it will complete normally.  In contrast,
  CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
  sleep indefinitely, in turn indefinitely delaying completion of the
  callback-free grace period.  Given that nothing is waiting on this grace
  period, this is also not a problem.
 
  Unless RCU CPU stall warnings are also enabled, as they are in recent
  kernels.  In this case, if a CPU wakes up after at least one minute
  of inactivity, an RCU CPU stall warning will result.  The reason that
  no one noticed until quite recently is that most systems have enough
  OS noise that they will never remain absolutely idle for a full minute.
  But there are some embedded systems with cut-down userspace configurations
  that get into this mode quite easily.
 
  All this begs the question of exactly how a callback-free grace period
  gets started in the first place.  This can happen due to the fact that
  CPUs do not necessarily agree on which grace period is in progress.
  If a CPU still believes that the grace period that just completed is
  still ongoing, it will believe that it has callbacks that need to wait
  for another grace period, never mind the fact that the grace period
  that they were waiting for just completed.  This CPU can therefore
  erroneously decide to start a new grace period.
 
  Once this CPU notices that the earlier grace period completed, it will
  invoke its callbacks.  It then won't have any callbacks left.  If no
  other CPU has any callbacks, we now have a callback-free grace period.
 
  This commit therefore makes CPUs check more carefully before starting a
  new grace period.  This new check relies on an array of tail pointers
  into each CPU's list of callbacks.  If the CPU is up to date on which
  grace periods have completed, it checks to see if any callbacks follow
  the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
  follow the RCU_WAIT_TAIL segment.  The reason that this works is that
  the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
  as soon as the CPU figures out that the old grace period has ended.
 
  This change is to cpu_needs_another_gp(), which is called in a number
  of places.  The only one that really matters is in rcu_start_gp(), where
  the root rcu_node structure's -lock is held, which prevents any
  other CPU from starting or completing a grace period, so that the
  comparison that determines whether the CPU is missing the completion
  of a grace period is stable.
 
  Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 
 As already confirmed by Paul W and others, I too no longer see the rcu dumps
 any more with above patch. Thanks a lot for the fix.

Glad it finally works!

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-24 Thread Paul E. McKenney
On Mon, Sep 24, 2012 at 09:54:00PM +, Paul Walmsley wrote:
 On Sat, 22 Sep 2012, Paul E. McKenney wrote:
 
  On Sat, Sep 22, 2012 at 10:20:19PM +, Paul Walmsley wrote:
   On Sat, 22 Sep 2012, Paul E. McKenney wrote:
  
This thing has been in the kernel since about 2004, not sure why you 
didn't hit it earlier.
   
   One other data point in that regard - noticed the warnings don't appear 
   when the board is booted with:
   
   commit 4fa3b6cb1bc8c14b81b4c8ffdfd3f2500a7e9367
   Author: Paul E. McKenney paul.mcken...@linaro.org
   Date:   Tue Jun 5 15:53:53 2012 -0700
   
   rcu: Fix qlen_lazy breakage
  
  You lost me on this one.  This is already in mainline, so if you were
  using (say) 3.6-rc6, you would already have this commit applied.
 
 If I check out a kernel at this commit 
 4fa3b6cb1bc8c14b81b4c8ffdfd3f2500a7e9367 + the 
 zero-callback-in-tickless-idle diagnostic patch, build and boot it, then 
 the stall warnings don't appear (in 25 minutes of testing).
 
 Messages from the diagnostic patch indicate that the kernel is entering 
 idle during a grace period with no RCU callbacks, though.
 
 This is not a big deal and does not need any further attention.  Just 
 wanted to place a time boundary on the point when these messages started 
 appearing.  (It is unlikely to be an optimal bound: i.e., there are 
 probably later commits where the warnings also don't appear.)

Ah, got it, thank you!

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-23 Thread Paul E. McKenney
On Sun, Sep 23, 2012 at 07:55:50AM +, Paul Walmsley wrote:
 On Sat, 22 Sep 2012, Paul E. McKenney wrote:
 
  On Sat, Sep 22, 2012 at 10:25:59PM +, Paul Walmsley wrote:
  
   The recent tests here have been on Pandaboard, which is dual-CPU, but my 
   recollection is that I also observed the warnings on a single-core 
   Beagleboard.  Will re-test.
  
  Anxiously awaiting the results.
 
 The same problem exists on BeagleBoard XM (OMAP3730, single-core 
 Cortex-A8):
 

 http://www.pwsan.com/omap/transcripts/20120922-beaglexm-rcu-stall-debug-pre-fix.txt
 
 and the same patch fixes it:
 

 http://www.pwsan.com/omap/transcripts/20120922-beaglexm-rcu-stall-debug-post-fix.txt
 
 Please feel free to update my Tested-by:, if you wish.
 
 Tested-by: Paul Walmsley p...@pwsan.com  # OMAP3730, OMAP4430

Very good, thank you very much!!!

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-22 Thread Paul E. McKenney
On Sat, Sep 22, 2012 at 05:45:12PM +0200, Frederic Weisbecker wrote:
 2012/9/22 Paul E. McKenney paul...@linux.vnet.ibm.com:
  On Fri, Sep 21, 2012 at 01:31:49PM -0700, Tony Lindgren wrote:
  * Paul E. McKenney paul...@linux.vnet.ibm.com [120921 12:58]:
  
   Just to make sure I understand the combinations:
  
   o   All stalls have happened when running a minimal userspace.
   o   CONFIG_NO_HZ=n suppresses the stalls.
   o   CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
   no observable effect on the stalls.
 
  The reason why you may need minimal userspace is to cut down
  the number of timers waking up the system with NO_HZ.
  Booting with init=/bin/sh might also do the trick for that.
 
  Good point!  This does make for a very quiet system, but does not
  reproduce the problem under kvm, even after waiting for four minutes.
  I will leave it for more time, but it looks like I really might need to
  ask Linaro for remote access to a Panda.
 
 I have one. I'm currently installing Ubuntu on it and I'll try to
 manage to build
 a kernel and reproduce the issue.
 
 I'll give more news soon.

Thank you!

My bet is that you have to have a userspace that is so small that it
registers only a few (but at least one!) RCU callback at boot time,
then never registers any callbacks ever again.  I have coded up a
crude test case, using Tony Lindgren's suggestion of init=/bin/sh,
but I appear to have inadvertently fixed this bug in current -rcu
(git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git,
branch rcu/next).

But I have been wrong a few times already on this particular bug...

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-22 Thread Paul E. McKenney
On Sat, Sep 22, 2012 at 06:42:08PM +, Paul Walmsley wrote:
 On Fri, 21 Sep 2012, Paul E. McKenney wrote:
 
  Could you please point me to a recipe for creating a minimal userspace?
  Just in case it is the userspac erather than the architecture/hardware
  that makes the difference.
 
 Tony's suggestion is pretty good.  Note that there may also be differences 
 in kernel timers -- either between x86 and ARM architectures, or loaded 
 device drivers -- that may confound the problem.

For example, there must be at least one RCU callback outstanding after
the boot sequence quiets down.  Of course, the last time I tried Tony's
approach, I was doing it on top of my -rcu stack, so am retrying on
v3.6-rc6.

  Just to make sure I understand the combinations:
  
  o   All stalls have happened when running a minimal userspace.
  o   CONFIG_NO_HZ=n suppresses the stalls.
  o   CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
  no observable effect on the stalls.
  
  Did I get that right, or am I missing a combination?
 
 That's correct.
 
  Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is 
  cancelled upon exit from idle, and therefore should (almost) never 
  actually execute. Its sole purpose is to wake up the CPU.  ;-)
 
 Right.  Just curious, what would wake up the kernel from idle to handle a 
 grace period expiration when CONFIG_RCU_FAST_NO_HZ=n?  On a very idle 
 system, the time between timer ticks could potentially be several tens of 
 seconds.

If CONFIG_RCU_FAST_NO_HZ=n, then CPUs with RCU callbacks are not permitted
to shut off the scheduling-clock tick, so any CPU with RCU callbacks will
be awakened every jiffy.  The problem is that there appears to be a way
to get an RCU grace period started without any CPU having any callbacks,
which, as you surmise, would result in all the CPUs going to sleep and
the grace period never ending.  So if a CPU is awakened for any reason
after this everlasting grace period has extended for more than a minute,
the first thing that CPU will do is print an RCU CPU stall warning.

I believe that I see how to prevent callback-free grace periods from
ever starting.  (Famous last words...)

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-22 Thread Paul E. McKenney
On Sat, Sep 22, 2012 at 06:16:15PM +, Paul Walmsley wrote:
 Hi Paul
 
 On Fri, 21 Sep 2012, Paul E. McKenney wrote:
 
  I am wondering if your system somehow figured out how to start a grace
  period that had no RCU callbacks waiting for it.  If that happened,
  then a CONFIG_NO_HZ=y system could in theory get into a state where all
  CPUs are in dyntick-idle mode, so that none of them is doing anything
  to force the grace period to complete.
 
  That should be easy to diagnose, anyway.  Please see below, which
  includes the earlier diagnostic patch.
 
 Here you go.
 
 - Paul
 
 [  248.902618] INFO: rcu_sched self-detected stall on CPU
 [  248.905456]  0: (1 ticks this GP) idle=933/1/0 
 [  248.907897]   (t=26570 jiffies g=11 c=10 q=0)

Bingo!!!  (q=0, in case you were wondering.  And thank you for testing this!)

Strangely enough, I believe that I have inadvertently fixed this in
my -rcu tree:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next

Nevertheless, if you get a chance to try it, I would be interested to
hear if my guess is correct.  The trick is that a kthread drives the
grace period in -rcu, regardless of whether or not there are callbacks.

However, the backport would not be something that -stable would be happy
with, so I will be putting together a fix for mainline.  This thing
has been in the kernel since about 2004, not sure why you didn't hit
it earlier.

Thanx, Paul

 [  248.910339] [c001bc90] (unwind_backtrace+0x0/0xf0) from [c00ad800] 
 (rcu_check_callbacks+0x220/0x714)
 [  248.915527] [c00ad800] (rcu_check_callbacks+0x220/0x714) from 
 [c00532a0] (update_process_times+0x38/0x68)
 [  248.920928] [c00532a0] (update_process_times+0x38/0x68) from 
 [c008c9e8] (tick_sched_timer+0x80/0xec)
 [  248.926116] [c008c9e8] (tick_sched_timer+0x80/0xec) from [c0068ed4] 
 (__run_hrtimer+0x7c/0x1e0)
 [  248.930999] [c0068ed4] (__run_hrtimer+0x7c/0x1e0) from [c0069cb8] 
 (hrtimer_interrupt+0x11c/0x2d0)
 [  248.936035] [c0069cb8] (hrtimer_interrupt+0x11c/0x2d0) from [c001a3cc] 
 (twd_handler+0x30/0x44)
 [  248.940948] [c001a3cc] (twd_handler+0x30/0x44) from [c00a7bd0] 
 (handle_percpu_devid_irq+0x90/0x13c)
 [  248.946075] [c00a7bd0] (handle_percpu_devid_irq+0x90/0x13c) from 
 [c00a4344] (generic_handle_irq+0x30/0x48)
 [  248.951538] [c00a4344] (generic_handle_irq+0x30/0x48) from [c0014e38] 
 (handle_IRQ+0x4c/0xac)
 [  248.956329] [c0014e38] (handle_IRQ+0x4c/0xac) from [c00084cc] 
 (gic_handle_irq+0x28/0x5c)
 [  248.960937] [c00084cc] (gic_handle_irq+0x28/0x5c) from [c04fb1a4] 
 (__irq_svc+0x44/0x5c)
 [  248.965484] Exception stack(0xc0729f58 to 0xc0729fa0)
 [  248.968231] 9f40:   
 0003b832 0001
 [  248.972686] 9f60:  c074a8e8 c0728000 c07c42c8 c05065a0 c074bdc8 
  411fc092
 [  248.977142] 9f80: c074bfe8  0001 c0729fa0 0003b833 c0015130 
 2113 
 [  248.981597] [c04fb1a4] (__irq_svc+0x44/0x5c) from [c0015130] 
 (default_idle+0x20/0x44)
 [  248.986083] [c0015130] (default_idle+0x20/0x44) from [c001535c] 
 (cpu_idle+0x9c/0x114)
 [  248.990539] [c001535c] (cpu_idle+0x9c/0x114) from [c06d77b0] 
 (start_kernel+0x2b4/0x304)
 

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-22 Thread Paul E. McKenney
On Sat, Sep 22, 2012 at 01:10:43PM -0700, Paul E. McKenney wrote:
 On Sat, Sep 22, 2012 at 06:42:08PM +, Paul Walmsley wrote:
  On Fri, 21 Sep 2012, Paul E. McKenney wrote:
  
   Could you please point me to a recipe for creating a minimal userspace?
   Just in case it is the userspac erather than the architecture/hardware
   that makes the difference.
  
  Tony's suggestion is pretty good.  Note that there may also be differences 
  in kernel timers -- either between x86 and ARM architectures, or loaded 
  device drivers -- that may confound the problem.
 
 For example, there must be at least one RCU callback outstanding after
 the boot sequence quiets down.  Of course, the last time I tried Tony's
 approach, I was doing it on top of my -rcu stack, so am retrying on
 v3.6-rc6.
 
   Just to make sure I understand the combinations:
   
   o All stalls have happened when running a minimal userspace.
   o CONFIG_NO_HZ=n suppresses the stalls.
   o CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
 no observable effect on the stalls.
   
   Did I get that right, or am I missing a combination?
  
  That's correct.
  
   Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is 
   cancelled upon exit from idle, and therefore should (almost) never 
   actually execute. Its sole purpose is to wake up the CPU.  ;-)
  
  Right.  Just curious, what would wake up the kernel from idle to handle a 
  grace period expiration when CONFIG_RCU_FAST_NO_HZ=n?  On a very idle 
  system, the time between timer ticks could potentially be several tens of 
  seconds.
 
 If CONFIG_RCU_FAST_NO_HZ=n, then CPUs with RCU callbacks are not permitted
 to shut off the scheduling-clock tick, so any CPU with RCU callbacks will
 be awakened every jiffy.  The problem is that there appears to be a way
 to get an RCU grace period started without any CPU having any callbacks,
 which, as you surmise, would result in all the CPUs going to sleep and
 the grace period never ending.  So if a CPU is awakened for any reason
 after this everlasting grace period has extended for more than a minute,
 the first thing that CPU will do is print an RCU CPU stall warning.
 
 I believe that I see how to prevent callback-free grace periods from
 ever starting.  (Famous last words...)

And here is a patch.  I am still having trouble reproducing the problem,
but figured that I should avoid serializing things.

Thanx, Paul



 b/kernel/rcutree.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

rcu: Fix day-one dyntick-idle stall-warning bug

Each grace period is supposed to have at least one callback waiting
for that grace period to complete.  However, if CONFIG_NO_HZ=n, an
extra callback-free grace period is no big problem -- it will chew up
a tiny bit of CPU time, but it will complete normally.  In contrast,
CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
sleep indefinitely, in turn indefinitely delaying completion of the
callback-free grace period.  Given that nothing is waiting on this grace
period, this is also not a problem.

Unless RCU CPU stall warnings are also enabled, as they are in recent
kernels.  In this case, if a CPU wakes up after at least one minute
of inactivity, an RCU CPU stall warning will result.  The reason that
no one noticed until quite recently is that most systems have enough
OS noise that they will never remain absolutely idle for a full minute.
But there are some embedded systems with cut-down userspace configurations
that get into this mode quite easily.

All this begs the question of exactly how a callback-free grace period
gets started in the first place.  This can happen due to the fact that
CPUs do not necessarily agree on which grace period is in progress.
If a CPU still believes that the grace period that just completed is
still ongoing, it will believe that it has callbacks that need to wait
for another grace period, never mind the fact that the grace period
that they were waiting for just completed.  This CPU can therefore
erroneously decide to start a new grace period.

Once this CPU notices that the earlier grace period completed, it will
invoke its callbacks.  It then won't have any callbacks left.  If no
other CPU has any callbacks, we now have a callback-free grace period.

This commit therefore makes CPUs check more carefully before starting a
new grace period.  This new check relies on an array of tail pointers
into each CPU's list of callbacks.  If the CPU is up to date on which
grace periods have completed, it checks to see if any callbacks follow
the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
follow the RCU_WAIT_TAIL segment.  The reason that this works is that
the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
as soon as the CPU figures out that the old grace period has ended.

This change

Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-22 Thread Paul E. McKenney
On Sat, Sep 22, 2012 at 10:25:59PM +, Paul Walmsley wrote:
 On Sat, 22 Sep 2012, Paul E. McKenney wrote:
 
  And here is a patch.  I am still having trouble reproducing the problem,
  but figured that I should avoid serializing things.
 
 Thanks, testing this now on v3.6-rc6.

Very cool, thank you!

One question though about the patch 
 description:
 
  All this begs the question of exactly how a callback-free grace period
  gets started in the first place.  This can happen due to the fact that
  CPUs do not necessarily agree on which grace period is in progress.
  If a CPU still believes that the grace period that just completed is
  still ongoing, it will believe that it has callbacks that need to wait
  for another grace period, never mind the fact that the grace period
  that they were waiting for just completed.  This CPU can therefore
  erroneously decide to start a new grace period.
 
 Doesn't this imply that this bug would only affect multi-CPU systems?  

Surprisingly not, at least when running TREE_RCU or TREE_PREEMPT_RCU.
In order to keep lock contention down to a dull roar on larger systems,
TREE_RCU keeps three sets of books: (1) the global state in the rcu_state
structure, (2) the combining-tree per-node state in the rcu_node
structure, and the per-CPU state in the rcu_data structure.  A CPU is
not officially aware of the end of a grace period until it is reflected
in its rcu_data structure.  This has the perhaps-surprising consequence
that the CPU that detected the end of the old grace period might start
a new one before becoming officially aware that the old one ended.

Why not have the CPU inform itself immediately upon noticing that the
old grace period ended?  Deadlock.  The rcu_node locks must be acquired
from leaf towards root, and the CPU is holding the root rcu_node lock
when it notices that the grace period has ended.

I have made this a bit less problematic in the bigrt branch, working
towards a goal of getting RCU into a state where automatic formal
validation might one day be possible.  And yes, I am starting to get some
formal-validation people interested in this lofty goal, see for example:
http://sites.google.com/site/popl13grace/paper.pdf.

 The recent tests here have been on Pandaboard, which is dual-CPU, but my 
 recollection is that I also observed the warnings on a single-core 
 Beagleboard.  Will re-test.

Anxiously awaiting the results.  This has been a strange one, even by
RCU's standards.

Plus I need to add a few Reported-by lines.  Next version...

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-22 Thread Paul E. McKenney
On Sat, Sep 22, 2012 at 10:20:19PM +, Paul Walmsley wrote:
 Hi Paul
 
 On Sat, 22 Sep 2012, Paul E. McKenney wrote:
 
  Strangely enough, I believe that I have inadvertently fixed this in
  my -rcu tree:
  
  git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next
  
  Nevertheless, if you get a chance to try it, I would be interested to
  hear if my guess is correct.
 
 Yes, good news: the stall warnings go away with that branch.

Very good!

  The trick is that a kthread drives the grace period in -rcu, regardless 
  of whether or not there are callbacks.
 
 This is rcu: Move quiescent-state forcing into kthread ?

Yep, plus the preceding commits moving grace-period initialization and
cleanup into that same kthread.  This was motivated by a bug report
last February complaining about 200-microsecond latency spikes from
RCU grace-period initialization.  On systems with 4096 CPUs.

Real-time response.  It is far bigger than I thought.  ;-)

 Added some debugging into rcu_gp_kthread() after that commit and can 
 confirm that the quiescent-state forcing loop does start a few times when 
 there are zero callbacks pending (modulo any races in my measurement 
 code).

Cool, thank you!  Assuming it works, that indicates that there is long-term
value to the fix for this problem.  On larger systems, extra grace periods
are not what you want, as their expense increases with the number of CPUs.

  However, the backport would not be something that -stable would be happy
  with, so I will be putting together a fix for mainline.  This thing
  has been in the kernel since about 2004, not sure why you didn't hit
  it earlier.
 
 One other data point in that regard - noticed the warnings don't appear 
 when the board is booted with:
 
 commit 4fa3b6cb1bc8c14b81b4c8ffdfd3f2500a7e9367
 Author: Paul E. McKenney paul.mcken...@linaro.org
 Date:   Tue Jun 5 15:53:53 2012 -0700
 
 rcu: Fix qlen_lazy breakage

You lost me on this one.  This is already in mainline, so if you were
using (say) 3.6-rc6, you would already have this commit applied.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-22 Thread Paul E. McKenney
On Sun, Sep 23, 2012 at 01:42:10AM +, Paul Walmsley wrote:
 Hi Paul
 
 On Sat, 22 Sep 2012, Paul Walmsley wrote:
 
  On Sat, 22 Sep 2012, Paul E. McKenney wrote:
  
   And here is a patch.  I am still having trouble reproducing the problem,
   but figured that I should avoid serializing things.
  
  Thanks, testing this now on v3.6-rc6.
 
 Looks like you solved it!
 
 Tested v3.6-rc6 + your stall diagnostic patch:
 
 http://marc.info/?l=linux-arm-kernelm=134827237215882w=2
 
 on OMAP4430ES2 Pandaboard using omap2plus_defconfig and 
 CONFIG_RCU_CPU_STALL_INFO=y; got the stall warnings.
 
 Then added rcu: Fix day-one dyntick-idle stall-warning bug from:
 
 http://marc.info/?l=linux-arm-kernelm=134835120600590w=2
 
 Booted that, and the stall warnings did not appear within 30 minutes.

Very cool, thank you for your testing efforts!!!

May I apply your Tested-by to this patch?

And good show on the debugging patch -- it is quite good to have such
solid evidence that the bug that the fix was intended for was actually
occurring.

Thanx, Paul

 To confirm that the problem being solved matched your hypothesis, the
 debugging patch below[1] was added to the RCU idle entry/exit code.
 
 Without the bugfix patch, a boot log transcript was obtained
 indicating that the idle loop was entered with tick_nohz_enabled=1
 during a grace period with no callbacks present:
 
 http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-pre-fix.txt
 
 The debugging events started to appear at 1.867370 seconds into the
 boot.  ENTER was pressed about 464 seconds in; this triggered the
 rcu_sched stall traceback.
 
 With the bugfix patch, a boot log transcript was obtained that
 indicated that the condition under test never occurred after waiting
 about 20 minutes:
 
 
 http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-post-fix.txt
 
 Thanks for being so willing to root-cause the issue, Paul; it's 
 appreciated, and it's been quite instructive as well.  Will address some 
 remaining loose ends in follow-up E-mails.
 
 
 - Paul
 
 
 [1] Debugging patch to printk() if the previous idle loop entry occurred 
 with tick_nohz_enabled=1 during a grace period with no RCU callbacks 
 present:
 
 
 ---
  kernel/rcutree.c |   17 +
  1 file changed, 17 insertions(+)
 
 diff --git a/kernel/rcutree.c b/kernel/rcutree.c
 index f1eb7ad..f42941b 100644
 --- a/kernel/rcutree.c
 +++ b/kernel/rcutree.c
 @@ -60,6 +60,9 @@
 
  /* Data structures. */
 
 +extern int tick_nohz_enabled;
 +static int no_cbs_idle_entry_count;
 +
  static struct lock_class_key rcu_node_class[RCU_NUM_LVLS];
 
  #define RCU_STATE_INITIALIZER(sname, cr) { \
 @@ -400,8 +403,12 @@ void rcu_idle_enter(void)
   unsigned long flags;
   long long oldval;
   struct rcu_dynticks *rdtp;
 + int cpu;
 + long totqlen = 0;
 + struct rcu_data *rdp;
 
   local_irq_save(flags);
 + rdp = __get_cpu_var(rcu_sched_data);
   rdtp = __get_cpu_var(rcu_dynticks);
   oldval = rdtp-dynticks_nesting;
   WARN_ON_ONCE((oldval  DYNTICK_TASK_NEST_MASK) == 0);
 @@ -410,6 +417,12 @@ void rcu_idle_enter(void)
   else
   rdtp-dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
   rcu_idle_enter_common(rdtp, oldval);
 + if (tick_nohz_enabled  rcu_gp_in_progress(rdp-rsp)) {
 + for_each_possible_cpu(cpu)
 + totqlen += per_cpu_ptr(rdp-rsp-rda, cpu)-qlen;
 + if (totqlen == 0)
 + no_cbs_idle_entry_count = 1;
 + }
   local_irq_restore(flags);
  }
  EXPORT_SYMBOL_GPL(rcu_idle_enter);
 @@ -503,6 +516,10 @@ void rcu_idle_exit(void)
   rdtp-dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
   rcu_idle_exit_common(rdtp, oldval);
   local_irq_restore(flags);
 + if (no_cbs_idle_entry_count) {
 + no_cbs_idle_entry_count = 0;
 + pr_err(* Tickless idle was entered with zero RCU callbacks\n);
 + }
  }
  EXPORT_SYMBOL_GPL(rcu_idle_exit);
 
 -- 
 1.7.10.4
 

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-21 Thread Paul E. McKenney
On Fri, Sep 21, 2012 at 06:08:59PM +, Paul Walmsley wrote:
 cc Frederic Weisbecker - context is here: 
 
http://marc.info/?l=linux-kernelm=134749030206016w=2
 
 On Thu, 20 Sep 2012, Paul E. McKenney wrote:
 
  Fair point.  I am wondering whether there is some path into the idle
  loop that somehow avoids telling RCU that the CPU has in face entered
  idle.  There needs to be an rcu_idle_enter() call on the way to idle,
  otherwise RCU CPU stall warnings are expected behavior.
 
 As far as I know, our only idle entry point is in 
 arch/arm/common/process.c:cpu_idle().

In mainline, this is arch/arm/kernel/process.c, correct?

 Looking at the x86 idle entry, they call rcu_idle_{enter,exit}() inside 
 {stop,start}_critical_timings().  Making that change here didn't help.

The reason x86 does this is that they have idle notifiers deeper in the
idle loop that use RCU read-side critical sections.  So this was an
expected result.

 Also tried commenting out the code from the stop_critical_timings() call 
 to the WARN_ON(irqs_disabled()), and adding a local_irq_enable().  That 
 also didn't help, which suggests that the problem is not caused by the 
 OMAP-specific PM idle code.

I must admit that you make a convincing case here.  Though it does leave
me wondering what is different about Panda (and MX28, IIRC).

I may take your advice of remote access to a Panda board, though that
is likely to take a bit of time due to timezones.  Regardless of the
underlying issue here, I clearly need to make the stall-warning messages
do a better job of printing out needed information.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-21 Thread Paul E. McKenney
On Fri, Sep 21, 2012 at 07:11:14PM +, Paul Walmsley wrote:
 On Fri, 21 Sep 2012, Paul E. McKenney wrote:
 
  On Fri, Sep 21, 2012 at 06:08:59PM +, Paul Walmsley wrote:
  
   As far as I know, our only idle entry point is in 
   arch/arm/common/process.c:cpu_idle().
  
  In mainline, this is arch/arm/kernel/process.c, correct?
 
 Indeed; sorry about that, mistyped.

No problem!

   Looking at the x86 idle entry, they call rcu_idle_{enter,exit}() inside 
   {stop,start}_critical_timings().  Making that change here didn't help.
  
  The reason x86 does this is that they have idle notifiers deeper in the
  idle loop that use RCU read-side critical sections.  So this was an
  expected result.
 
 OK
 
   Also tried commenting out the code from the stop_critical_timings() call 
   to the WARN_ON(irqs_disabled()), and adding a local_irq_enable().  That 
   also didn't help, which suggests that the problem is not caused by the 
   OMAP-specific PM idle code.
  
  I must admit that you make a convincing case here.  Though it does leave
  me wondering what is different about Panda (and MX28, IIRC).
 
 Given the dependency on CONFIG_NO_HZ, the stalls are probably dependent on 
 the userspace in use.  The userspaces here are quite minimal and so allow 
 the system to stay idle for relatively long periods of time.

Could you please point me to a recipe for creating a minimal userspace?
Just in case it is the userspac erather than the architecture/hardware
that makes the difference.

  I may take your advice of remote access to a Panda board, though that
  is likely to take a bit of time due to timezones.  Regardless of the
  underlying issue here, I clearly need to make the stall-warning messages
  do a better job of printing out needed information.
 
 If you've got a patch in mind for that, I'll boot it here.

Hammering it out, will send it along when it is a bit less destructive.  ;-)

 One other observation.  omap2plus_defconfig sets CONFIG_NO_HZ=y but 
 doesn't set CONFIG_RCU_FAST_NO_HZ.  The stall warning messages still 
 appear when CONFIG_RCU_FAST_NO_HZ=y.  One of them is attached below (with 
 CONFIG_RCU_CPU_STALL_INFO set as well, obviously).

Just to make sure I understand the combinations:

o   All stalls have happened when running a minimal userspace.
o   CONFIG_NO_HZ=n suppresses the stalls.
o   CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
no observable effect on the stalls.

Did I get that right, or am I missing a combination?

 As an aside, in the CONFIG_RCU_FAST_NO_HZ=y build, I dropped a printk() 
 into rcu_idle_gp_timer_func() and it doesn't look like it ever executed.

Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is cancelled
upon exit from idle, and therefore should (almost) never actually execute.
Its sole purpose is to wake up the CPU.  ;-)

Thanx, Paul

 - Paul
 
 [  305.832000] INFO: rcu_sched self-detected stall on CPU
 [  305.834838]  1: (2 GPs behind) idle=5b1/1/0 drain=0 . timer=4294967295
 [  305.838378]   (t=17463 jiffies)
 [  305.840118] [c001be10] (unwind_backtrace+0x0/0xf0) from [c00ad65c] 
 (rcu_pending+0xd0/0x540)
 [  305.844848] [c00ad65c] (rcu_pending+0xd0/0x540) from [c00ae5cc] 
 (rcu_check_callbacks+0x110/0x198)
 [  305.849884] [c00ae5cc] (rcu_check_callbacks+0x110/0x198) from 
 [c0053800] (update_process_times+0x38/0x68)
 [  305.855285] [c0053800] (update_process_times+0x38/0x68) from 
 [c008cf40] (tick_sched_timer+0x80/0xec)
 [  305.860473] [c008cf40] (tick_sched_timer+0x80/0xec) from [c006942c] 
 (__run_hrtimer+0x7c/0x1e0)
 [  305.865356] [c006942c] (__run_hrtimer+0x7c/0x1e0) from [c006a210] 
 (hrtimer_interrupt+0x11c/0x2d0)
 [  305.870361] [c006a210] (hrtimer_interrupt+0x11c/0x2d0) from [c001a54c] 
 (twd_handler+0x30/0x44)
 [  305.875274] [c001a54c] (twd_handler+0x30/0x44) from [c00a8128] 
 (handle_percpu_devid_irq+0x90/0x13c)
 [  305.880371] [c00a8128] (handle_percpu_devid_irq+0x90/0x13c) from 
 [c00a489c] (generic_handle_irq+0x30/0x48)
 [  305.885833] [c00a489c] (generic_handle_irq+0x30/0x48) from [c0014fb8] 
 (handle_IRQ+0x4c/0xac)
 [  305.890624] [c0014fb8] (handle_IRQ+0x4c/0xac) from [c000864c] 
 (gic_handle_irq+0x28/0x5c)
 [  305.895233] [c000864c] (gic_handle_irq+0x28/0x5c) from [c04fbc64] 
 (__irq_svc+0x44/0x5c)
 [  305.899780] Exception stack(0xde86ff88 to 0xde86ffd0)
 [  305.902526] ff80:   0004c062 0001  de8660c0 
 de86e000 c07c42c8
 [  305.906982] ffa0: c05075a0 c074bdd0  411fc092 c074bff0  
 0001 de86ffd0
 [  305.911437] ffc0: 0004c063 c00152b0 2113 
 [  305.914184] [c04fbc64] (__irq_svc+0x44/0x5c) from [c00152b0] 
 (default_idle+0x20/0x44)
 [  305.918640] [c00152b0] (default_idle+0x20/0x44) from [c00154dc] 
 (cpu_idle+0x9c/0x114)
 [  305.923126] [c00154dc] (cpu_idle+0x9c/0x114) from [804f4a34] 
 (0x804f4a34)
 

--
To unsubscribe from this list: send the line unsubscribe linux-omap

Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-21 Thread Paul E. McKenney
On Fri, Sep 21, 2012 at 05:47:31PM +, Paul Walmsley wrote:
 Hi Paul
 
 On Thu, 20 Sep 2012, Paul Walmsley wrote:
 
  On Thu, 20 Sep 2012, Paul E. McKenney wrote:
 
   Paul Walmsley, please let me know if the config below doesn't clear things
   up for you or if there is some reason why this config is infeasible.
  
  Will certainly test it here once things settle down.
 
 I built an OMAP kernel from Linus' commit 
 4651afbbae968772efd6dc4ba461cba9b49bb9d8 (Merge branch 'for-3.6-fixes' of 
 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq).  The config used 
 was 'omap2plus_defconfig', and enabled CONFIG_CPU_IDLE by hand.  Booted it 
 on a Pandaboard (OMAP4430ES2) into a very minimal Debian rootfs.

Did you have the patch at https://lkml.org/lkml/2012/8/30/290 applied?
If not, could you please try it?  (This patch cleared up a similar
problem for Becky, also on OMAP.)

Thanx, Paul

 Long story short, the rcu_sched stall messages continue to appear.  
 Transcript below.
 
 The above config should be easy to reproduce from mainline.  Paul, not 
 sure if you're still working with Linaro, but you should be able to get 
 access to a Pandaboard through them to try it yourself, should you wish.
 
 
 - Paul
 
 ## Booting kernel from Legacy Image at 8200 ...   
 
Image Name:   Linux-3.6.0-rc6-00025-g4651afb   
 
Image Type:   ARM Linux Kernel Image (uncompressed)
 
Data Size:4061576 Bytes = 3.9 MiB  
 
Load Address: 80008000 
 
Entry Point:  80008000 
 
Verifying Checksum ... OK  
 
Loading Kernel Image ... OK
 
 OK
 
   
 
 Starting kernel ...   
 
   
 
 Uncompressing Linux... done, booting the kernel.  
 
 [0.00] Booting Linux on physical CPU 0
 
 [0.00] Linux version 3.6.0-rc6-00025-g4651afb (paul@dusk) (gcc 
 version 4.5.1 (Sourcery G++ Lite 2010.09-50) ) #342 SMP Fri Sep 21 11:28:00 
 MDT 2012  

 [0.00] CPU: ARMv7 Processor [411fc092] revision 2 (ARMv7), 
 cr=10c53c7d
 [0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing 
 instruction cache   
 [0.00] Machine: OMAP4 Panda board 
 
 [0.00] debug: ignoring loglevel setting.  
 
 [0.00] Reserving 16777216 bytes SDRAM for VRAM
 
 [0.00] Memory policy: ECC disabled, Data cache writealloc
 [0.00] On node 0 totalpages: 126720   
 
 [0.00] free_area_init_node: node 0, pgdat c07c6140, node_mem_map 
 c0d23000
 [0.00]   Normal zone: 1024 pages used for memmap
 [0.00]   Normal zone: 0 pages reserved
 [0.00]   Normal zone: 125696 pages, LIFO batch:31
 [0.00] OMAP4430 ES2.0
 [0.00] PERCPU: Embedded 9 pages/cpu @c112b000 s14016 r8192 d14656 
 u36864
 [0.00] pcpu-alloc: s14016 r8192 d14656 u36864 alloc=9*4096
 [0.00] pcpu-alloc: [0] 0 [0] 1 
 [0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
 pages: 125696
 [0.00] Kernel command line: console=ttyO2,230400n8 vram=16M 
 root=/dev/mmcblk0p2 rw rootfstype=ext3 rootwait earlyprintk debug 
 ignore_loglevel
 [0.00] PID hash table entries: 2048 (order: 1, 8192 bytes)
 [0.00] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
 [0.00] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
 [0.00] Memory: 495MB = 495MB total
 [0.00] Memory: 488828k/488828k available, 35460k reserved, 0K highmem
 [0.00] Virtual kernel memory layout:
 [0.00] vector  : 0x - 0x1000   (   4 kB)
 [0.00] fixmap  : 0xfff0 - 0xfffe   ( 896 kB)
 [0.00] vmalloc : 0xe080

Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-21 Thread Paul E. McKenney
On Fri, Sep 21, 2012 at 01:31:49PM -0700, Tony Lindgren wrote:
 * Paul E. McKenney paul...@linux.vnet.ibm.com [120921 12:58]:
  
  Just to make sure I understand the combinations:
  
  o   All stalls have happened when running a minimal userspace.
  o   CONFIG_NO_HZ=n suppresses the stalls.
  o   CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has
  no observable effect on the stalls.
 
 The reason why you may need minimal userspace is to cut down
 the number of timers waking up the system with NO_HZ.
 Booting with init=/bin/sh might also do the trick for that.

Good point!  This does make for a very quiet system, but does not
reproduce the problem under kvm, even after waiting for four minutes.
I will leave it for more time, but it looks like I really might need to
ask Linaro for remote access to a Panda.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation: RCU: update the stall warning message timer=-1 to match reality

2012-09-21 Thread Paul E. McKenney
On Fri, Sep 21, 2012 at 04:13:29PM +, Paul Walmsley wrote:
 
 The CONFIG_RCU_FAST_NO_HZ stall warning messages can never emit
 timer=-1.  This is because the printf() format specifier to generate
 that number is '%lu'.  So, update the documentation to use the
 unsigned long equivalent instead, timer=4294967295.  This is what
 actually shows up in traces.
 
 Signed-off-by: Paul Walmsley p...@pwsan.com
 Cc: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Dipankar Sarma dipan...@in.ibm.com

Good catch!  Even worse, it gives timer=18446744073709551615 on
64-bit systems, which is no easier on the eyes.  I therefore changed
the code to print a nicer message in this case, patch below.

The meaning of the timer=4294967295 was that the corresponding CPU
was either non-idle or idle with no RCU callbacks, FWIW.

Thanx, Paul

-

rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning message

The print_cpu_stall_fast_no_hz() function attempts to print -1 when
the -idle_gp_timer is not pending, but unsigned arithmetic causes it
to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
18446744073709551615 on 64-bit systems.  Neither of these are the most
reader-friendly values, so this commit instead causes timer not pending
to be printed when -idle_gp_timer is not pending.

Reported-by: Paul Walmsley p...@pwsan.com
Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 523364e..1927151 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -99,7 +99,7 @@ In kernels with CONFIG_RCU_FAST_NO_HZ, even more information 
is
 printed:
 
INFO: rcu_preempt detected stall on CPU
-   0: (64628 ticks this GP) idle=dd5/3fff/0 drain=0 . timer=-1
+   0: (64628 ticks this GP) idle=dd5/3fff/0 drain=0 . timer 
not pending
   (t=65000 jiffies)
 
 The (64628 ticks this GP) indicates that this CPU has taken more
@@ -116,13 +116,13 @@ number between the two /s is the value of the nesting, 
which will
 be a small positive number if in the idle loop and a very large positive
 number (as shown above) otherwise.
 
-For CONFIG_RCU_FAST_NO_HZ kernels, the drain=0 indicates that the
-CPU is not in the process of trying to force itself into dyntick-idle
-state, the . indicates that the CPU has not given up forcing RCU
-into dyntick-idle mode (it would be H otherwise), and the timer=-1
-indicates that the CPU has not recented forced RCU into dyntick-idle
-mode (it would otherwise indicate the number of microseconds remaining
-in this forced state).
+For CONFIG_RCU_FAST_NO_HZ kernels, the drain=0 indicates that the CPU is
+not in the process of trying to force itself into dyntick-idle state, the
+. indicates that the CPU has not given up forcing RCU into dyntick-idle
+mode (it would be H otherwise), and the timer not pending indicates
+that the CPU has not recently forced RCU into dyntick-idle mode (it
+would otherwise indicate the number of microseconds remaining in this
+forced state).
 
 
 Multiple Warnings From One Stall
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 3b1a11e..be822f0 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2245,11 +2245,15 @@ static void print_cpu_stall_fast_no_hz(char *cp, int 
cpu)
 {
struct rcu_dynticks *rdtp = per_cpu(rcu_dynticks, cpu);
struct timer_list *tltp = rdtp-idle_gp_timer;
+   char c;
 
-   sprintf(cp, drain=%d %c timer=%lu,
-   rdtp-dyntick_drain,
-   rdtp-dyntick_holdoff == jiffies ? 'H' : '.',
-   timer_pending(tltp) ? tltp-expires - jiffies : -1);
+   c = rdtp-dyntick_holdoff == jiffies ? 'H' : '.';
+   if (timer_pending(tltp))
+   sprintf(cp, drain=%d %c timer=%lu,
+   rdtp-dyntick_drain, c, tltp-expires - jiffies);
+   else
+   sprintf(cp, drain=%d %c timer not pending,
+   rdtp-dyntick_drain, c);
 }
 
 #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-21 Thread Paul E. McKenney
On Fri, Sep 21, 2012 at 12:57:17PM -0700, Paul E. McKenney wrote:
 On Fri, Sep 21, 2012 at 07:11:14PM +, Paul Walmsley wrote:
  On Fri, 21 Sep 2012, Paul E. McKenney wrote:

[ . . . ]

   I may take your advice of remote access to a Panda board, though that
   is likely to take a bit of time due to timezones.  Regardless of the
   underlying issue here, I clearly need to make the stall-warning messages
   do a better job of printing out needed information.
  
  If you've got a patch in mind for that, I'll boot it here.
 
 Hammering it out, will send it along when it is a bit less destructive.  ;-)

And I eventually decided to keep things simple, please see below.
This applies on top of https://lkml.org/lkml/2012/8/30/290.

Thanx, Paul



rcu: Add grace-period information to RCU CPU stall warnings

This commit causes the last grace period started and completed to be
printed on RCU CPU stall warning messages in order to aid diagnosis.

Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index cdaa7aa..fdb6854 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -925,8 +925,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
raw_spin_unlock_irqrestore(rnp-lock, flags);
 
print_cpu_stall_info_end();
-   printk(KERN_CONT (detected by %d, t=%ld jiffies)\n,
-  smp_processor_id(), (long)(jiffies - rsp-gp_start));
+   pr_cont((detected by %d, t=%ld jiffies, g=%lu, c=%lu)\n,
+  smp_processor_id(), (long)(jiffies - rsp-gp_start),
+  rsp-gpnum, rsp-completed);
if (ndetected == 0)
printk(KERN_ERR INFO: Stall ended before state dump start\n);
else if (!trigger_all_cpu_backtrace())
@@ -953,7 +954,8 @@ static void print_cpu_stall(struct rcu_state *rsp)
print_cpu_stall_info_begin();
print_cpu_stall_info(rsp, smp_processor_id());
print_cpu_stall_info_end();
-   printk(KERN_CONT  (t=%lu jiffies)\n, jiffies - rsp-gp_start);
+   pr_cont( (t=%lu jiffies g=%lu c=%lu)\n,
+   jiffies - rsp-gp_start, rsp-gpnum, rsp-completed);
if (!trigger_all_cpu_backtrace())
dump_stack();
 

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-21 Thread Paul E. McKenney
On Fri, Sep 21, 2012 at 10:41:14PM +, Paul Walmsley wrote:
 On Fri, 21 Sep 2012, Paul E. McKenney wrote:
 
  On Fri, Sep 21, 2012 at 05:47:31PM +, Paul Walmsley wrote:
 
   I built an OMAP kernel from Linus' commit 
   4651afbbae968772efd6dc4ba461cba9b49bb9d8 (Merge branch 'for-3.6-fixes' 
   of 
   git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq).  The config used 
   was 'omap2plus_defconfig', and enabled CONFIG_CPU_IDLE by hand.  Booted 
   it 
   on a Pandaboard (OMAP4430ES2) into a very minimal Debian rootfs.
  
  Did you have the patch at https://lkml.org/lkml/2012/8/30/290 applied?
 
 No, it's just as described above.
 
  If not, could you please try it?  (This patch cleared up a similar
  problem for Becky, also on OMAP.)
 
 Did not seem to help, either with or without CONFIG_CPU_IDLE.

I was hoping!  ;-)

And my init=/bin/sh kernel ran idle for more than an hour without
any RCU CPU stall warnings...

I am wondering if your system somehow figured out how to start a grace
period that had no RCU callbacks waiting for it.  If that happened,
then a CONFIG_NO_HZ=y system could in theory get into a state where all
CPUs are in dyntick-idle mode, so that none of them is doing anything
to force the grace period to complete.

That should be easy to diagnose, anyway.  Please see below, which
includes the earlier diagnostic patch.

Thanx, Paul



diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 307caf1..696f189 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -879,6 +879,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
unsigned long flags;
int ndetected = 0;
struct rcu_node *rnp = rcu_get_root(rsp);
+   long totqlen = 0;
 
/* Only let one CPU complain about others per time interval. */
 
@@ -923,8 +924,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
raw_spin_unlock_irqrestore(rnp-lock, flags);
 
print_cpu_stall_info_end();
-   printk(KERN_CONT (detected by %d, t=%ld jiffies)\n,
-  smp_processor_id(), (long)(jiffies - rsp-gp_start));
+   for_each_possible_cpu(cpu)
+   totqlen += per_cpu_ptr(rsp-rda, cpu)-qlen;
+   pr_cont((detected by %d, t=%ld jiffies, g=%lu, c=%lu, q=%lu)\n,
+  smp_processor_id(), (long)(jiffies - rsp-gp_start),
+  rsp-gpnum, rsp-completed, totqlen);
if (ndetected == 0)
printk(KERN_ERR INFO: Stall ended before state dump start\n);
else if (!trigger_all_cpu_backtrace())
@@ -939,8 +943,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 
 static void print_cpu_stall(struct rcu_state *rsp)
 {
+   int cpu;
unsigned long flags;
struct rcu_node *rnp = rcu_get_root(rsp);
+   long totqlen = 0;
 
/*
 * OK, time to rat on ourselves...
@@ -951,7 +957,10 @@ static void print_cpu_stall(struct rcu_state *rsp)
print_cpu_stall_info_begin();
print_cpu_stall_info(rsp, smp_processor_id());
print_cpu_stall_info_end();
-   printk(KERN_CONT  (t=%lu jiffies)\n, jiffies - rsp-gp_start);
+   for_each_possible_cpu(cpu)
+   totqlen += per_cpu_ptr(rsp-rda, cpu)-qlen;
+   pr_cont( (t=%lu jiffies g=%lu c=%lu q=%lu)\n,
+   jiffies - rsp-gp_start, rsp-gpnum, rsp-completed, totqlen);
if (!trigger_all_cpu_backtrace())
dump_stack();
 

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-20 Thread Paul E. McKenney
On Thu, Sep 20, 2012 at 09:49:13PM +, Bruce, Becky wrote:
 
 On Sep 20, 2012, at 2:56 AM, Paul Walmsley wrote:
 
  Hi,
  
  On Wed, 19 Sep 2012, Paul E. McKenney wrote:
  
  On Thu, Sep 13, 2012 at 06:52:10PM +, Paul Walmsley wrote:
  
  On Wed, 12 Sep 2012, Paul E. McKenney wrote:
  
  Subodh Nijsure (also CCed) reported something that might be similar on
  ARM, and also reported that setting the following got rid of the stalls:
  
   CONFIG_CPU_IDLE=y
   CONFIG_CPU_IDLE_GOV_LADDER=y
   CONFIG_CPU_IDLE_GOV_MENU=y
  
  At which point he was happy, which was good, but which also left the
  underlying problem unsolved.  Do these affect your system?  If so,
  do they cause a different ARM idle loop to be executed?
  
  Will give this a try.  What board was Subodh using?
  
  Any news on trying the above settings?
  
  Sorry, haven't had the chance to try it yet due to the impending merge 
  window opening.  Once things settle down I'll give it a try -- or maybe 
  someone else can test it in the meantime.
  
 
 OK, people, you can stop heckling me about sent from my iPhone - I'm in the 
 wilds of rural Louisiana with really bad internet service and was trying to 
 work on my phone (but, alas, did not notice the CC list included the entire 
 universe).   Shame on me.
 
 With the above set, I don't seem to see any stalls with the RCU timeout set 
 to 60s (the default).  I left the board running for 25 minutes; I will fire 
 it up again later and let it run for a bit longer, but usually I end up 
 seeing the problem pretty quickly so I don't expect that to result in 
 anything.  I also didn't see any stalls on Paul's RCU tree as of a week ago 
 at 60s, so as far as I can tell the CPU_IDLE stuff didn't have any impact (it 
 wasn't on when I tested Paul's tree).
 
 If I drop the timeout to 5s as Paul M. suggested for debug a while back, I do 
 see stalls (both with CPU_IDLE stuff and without). 
 
 I'm using the default omap2plus config, with RCU stall info enabled and the 
 cpu idle stuff turned on (console dump below).  This is a Panda ES 1.1 
 (OMAP4460)

Thank you for the testing, Becky!

Paul Walmsley, please let me know if the config below doesn't clear things
up for you or if there is some reason why this config is infeasible.

Thanx, Paul

 root@omap4430-panda:~# zcat /proc/config/gz. .. .. ..gz | grep RCU
 # RCU Subsystem
 CONFIG_TREE_RCU=y
 # CONFIG_PREEMPT_RCU is not set
 CONFIG_RCU_FANOUT=32
 CONFIG_RCU_FANOUT_LEAF=16
 # CONFIG_RCU_FANOUT_EXACT is not set
 # CONFIG_RCU_FAST_NO_HZ is not set
 # CONFIG_TREE_RCU_TRACE is not set
 # CONFIG_PROVE_RCU is not set
 # CONFIG_SPARSE_RCU_POINTER is not set
 # CONFIG_RCU_TORTURE_TEST is not set
 CONFIG_RCU_CPU_STALL_TIMEOUT=5
 CONFIG_RCU_CPU_STALL_INFO=y
 # CONFIG_RCU_TRACE is not set
 root@omap4430-panda:~# zcat /proc/config.gz | grep IDLE
 CONFIG_GENERIC_SMP_IDLE_THREAD=y
 CONFIG_CPU_IDLE=y
 CONFIG_CPU_IDLE_GOV_LADDER=y
 CONFIG_CPU_IDLE_GOV_MENU=y
 CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y
 
 Paul, let me know if you want me to try anything else.  My internet 
 connection is spotty today but (obviously :) I will see emails on my phone 
 and will test when I can.
 
 Cheers,
 B
 
 Console output:
 
 root@omap4430-panda:~# [  377.495361] INFO: rcu_sched self-detected stall on 
 CPU
 [  377.500762] .1: (1 ticks this GP) idle=dcd/1/0 
 [  377.505523] . (t=761 jiffies)
 [  377.508666] [c0019da0] (unwind_backtrace+0x0/0xf8) from [c009b138] 
 (rcu_check_callbacks+0x204/0x790)
 [  377.518615] [c009b138] (rcu_check_callbacks+0x204/0x790) from 
 [c0045890] (update_process_times+0x38/0x68)
 [  377.529022] [c0045890] (update_process_times+0x38/0x68) from 
 [c007d47c] (tick_sched_timer+0x80/0xec)
 [  377.538970] [c007d47c] (tick_sched_timer+0x80/0xec) from [c005a2fc] 
 (__run_hrtimer+0x7c/0x218)
 [  377.548339] [c005a2fc] (__run_hrtimer+0x7c/0x218) from [c005b040] 
 (hrtimer_interrupt+0x130/0x2d8)
 [  377.558013] [c005b040] (hrtimer_interrupt+0x130/0x2d8) from [c0018998] 
 (twd_handler+0x30/0x44)
 [  377.567413] [c0018998] (twd_handler+0x30/0x44) from [c00960cc] 
 (handle_percpu_devid_irq+0x90/0x158)
 [  377.577270] [c00960cc] (handle_percpu_devid_irq+0x90/0x158) from 
 [c00929ac] (generic_handle_irq+0x30/0x44)
 [  377.587768] [c00929ac] (generic_handle_irq+0x30/0x44) from [c0013bd8] 
 (handle_IRQ+0x4c/0xac)
 [  377.596984] [c0013bd8] (handle_IRQ+0x4c/0xac) from [c0008470] 
 (gic_handle_irq+0x24/0x58)
 [  377.605834] [c0008470] (gic_handle_irq+0x24/0x58) from [c0487604] 
 (__irq_svc+0x44/0x58)
 [  377.614593] Exception stack(0xee06ff08 to 0xee06ff50)
 [  377.619873] ff00:   0001 0001  3b9aca00 
 c608bc44 0057
 [  377.628448] ff20: c146a4f0 0002 c54e3b8f 0056 c048fb3c c0c47654 
 0001 ee06ff50
 [  377.637023] ff40: c0084774 c0390fac 2113 
 [  377.642333] [c0487604] (__irq_svc+0x44/0x58) from [c0390fac] 
 (cpuidle_wrap_enter+0x4c/0xa4)
 [  377.651458] [c0390fac

Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-20 Thread Paul E. McKenney
On Thu, Sep 20, 2012 at 10:47:25PM +, Paul Walmsley wrote:
 On Thu, 20 Sep 2012, Paul E. McKenney wrote:
 
  On Thu, Sep 20, 2012 at 09:49:13PM +, Bruce, Becky wrote:
 
   OK, people, you can stop heckling me about sent from my iPhone - I'm in 
   the wilds of rural Louisiana with really bad internet service and was 
   trying to work on my phone (but, alas, did not notice the CC list 
   included the entire universe).   Shame on me.
   
   With the above set, I don't seem to see any stalls with the RCU timeout 
   set to 60s (the default).  I left the board running for 25 minutes; I 
   will fire it up again later and let it run for a bit longer, but usually 
   I end up seeing the problem pretty quickly so I don't expect that to 
   result in anything.  I also didn't see any stalls on Paul's RCU tree as 
   of a week ago at 60s, so as far as I can tell the CPU_IDLE stuff didn't 
   have any impact (it wasn't on when I tested Paul's tree).
   
   If I drop the timeout to 5s as Paul M. suggested for debug a while back, 
   I do see stalls (both with CPU_IDLE stuff and without). 
   
   I'm using the default omap2plus config, with RCU stall info enabled and 
   the cpu idle stuff turned on (console dump below).  This is a Panda ES 
   1.1 (OMAP4460)
  
  Thank you for the testing, Becky!
  
  Paul Walmsley, please let me know if the config below doesn't clear things
  up for you or if there is some reason why this config is infeasible.
 
 Will certainly test it here once things settle down.  But I wouldn't 
 consider the use of CONFIG_CPU_IDLE to be a 'fix' until the problem can be 
 understood.  We use !CONFIG_CPU_IDLE kernels for testing here.

Fair point.  I am wondering whether there is some path into the idle
loop that somehow avoids telling RCU that the CPU has in face entered
idle.  There needs to be an rcu_idle_enter() call on the way to idle,
otherwise RCU CPU stall warnings are expected behavior.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-19 Thread Paul E. McKenney
On Thu, Sep 13, 2012 at 06:52:10PM +, Paul Walmsley wrote:
 Hi Paul,
 
 thanks for the reply,
 
 On Wed, 12 Sep 2012, Paul E. McKenney wrote:
 
  Interesting.  I am assuming that the interrupt in the stack below came
  from idle, if not, please let me know what.
 
 According to the exception stack section in the original traceback, it
 appears that the serial interrupt took the SoC out of idle.
 
  Could you please reproduce with CONFIG_RCU_CPU_STALL_INFO=y?  That would
  give me a bit more information about why RCU thought that there was
  a stall.  (CCing Becky Bruce, who saw something similar recently.)
 
 At the bottom of this mail is a series of tracebacks with
 CONFIG_RCU_CPU_STALL_INFO=y.  Unlike the traceback that was sent in
 the last message, these were not triggered by serial activity.  These
 appeared every 300 seconds.
 
  Subodh Nijsure (also CCed) reported something that might be similar on
  ARM, and also reported that setting the following got rid of the stalls:
  
  CONFIG_CPU_IDLE=y
  CONFIG_CPU_IDLE_GOV_LADDER=y
  CONFIG_CPU_IDLE_GOV_MENU=y
  
  At which point he was happy, which was good, but which also left the
  underlying problem unsolved.  Do these affect your system?  If so,
  do they cause a different ARM idle loop to be executed?
 
 Will give this a try.  What board was Subodh using?

Hello, Paul,

Any news on trying the above settings?

Thanx, Paul

 - Paul
 
 
 Debian GNU/Linux wheezy/sid armel ttyO2
 
 armel login: [  305.942108] INFO: rcu_sched self-detected stall on CPU
 [  305.944946]  1: (7 GPs behind) idle=57b/1/0 
 [  305.947265]   (t=22811 jiffies)
 [  305.949066] [c001b7cc] (unwind_backtrace+0x0/0xf0) from [c00acc28] 
 (rcu_check_callbacks+0x1b0/0x678)
 [  305.954223] [c00acc28] (rcu_check_callbacks+0x1b0/0x678) from 
 [c00529e0] (update_process_times+0x38/0x68)
 [  305.959625] [c00529e0] (update_process_times+0x38/0x68) from 
 [c008bf14] (tick_sched_timer+0x80/0xec)
 [  305.964813] [c008bf14] (tick_sched_timer+0x80/0xec) from [c006840c] 
 (__run_hrtimer+0x7c/0x1e0)
 [  305.969696] [c006840c] (__run_hrtimer+0x7c/0x1e0) from [c00691f0] 
 (hrtimer_interrupt+0x11c/0x2d0)
 [  305.974731] [c00691f0] (hrtimer_interrupt+0x11c/0x2d0) from [c001a04c] 
 (twd_handler+0x30/0x44)
 [  305.979644] [c001a04c] (twd_handler+0x30/0x44) from [c00a7068] 
 (handle_percpu_devid_irq+0x90/0x13c)
 [  305.984741] [c00a7068] (handle_percpu_devid_irq+0x90/0x13c) from 
 [c00a37dc] (generic_handle_irq+0x30/0x48)
 [  305.990234] [c00a37dc] (generic_handle_irq+0x30/0x48) from [c0014c58] 
 (handle_IRQ+0x4c/0xac)
 [  305.995025] [c0014c58] (handle_IRQ+0x4c/0xac) from [c0008478] 
 (gic_handle_irq+0x28/0x5c)
 [  305.999633] [c0008478] (gic_handle_irq+0x28/0x5c) from [c04f8ca4] 
 (__irq_svc+0x44/0x5c)
 [  306.004180] Exception stack(0xde86ff88 to 0xde86ffd0)
 [  306.006927] ff80:   0003c6d0 0001  de8660c0 
 de86e000 c07c23c8
 [  306.011383] ffa0: c0504590 c0749e20  411fc092 c074a040  
 0001 de86ffd0
 [  306.015838] ffc0: 0003c6d1 c0014f50 2113 
 [  306.018585] [c04f8ca4] (__irq_svc+0x44/0x5c) from [c0014f50] 
 (default_idle+0x20/0x44)
 [  306.023040] [c0014f50] (default_idle+0x20/0x44) from [c001517c] 
 (cpu_idle+0x9c/0x114)
 [  306.027526] [c001517c] (cpu_idle+0x9c/0x114) from [804f1af4] 
 (0x804f1af4)
 [  602.004486] INFO: rcu_sched detected stalls on CPUs/tasks:
 [  602.007476]  (detected by 0, t=60707 jiffies)
 [  602.009857] INFO: Stall ended before state dump start
 [  906.027893] INFO: rcu_sched self-detected stall on CPU
 [  906.030700]  1: (6 GPs behind) idle=647/1/0 
 [  906.033020]   (t=38379 jiffies)
 [  906.034790] [c001b7cc] (unwind_backtrace+0x0/0xf0) from [c00acc28] 
 (rcu_check_callbacks+0x1b0/0x678)
 [  906.039947] [c00acc28] (rcu_check_callbacks+0x1b0/0x678) from 
 [c00529e0] (update_process_times+0x38/0x68)
 [  906.045349] [c00529e0] (update_process_times+0x38/0x68) from 
 [c008bf14] (tick_sched_timer+0x80/0xec)
 [  906.050537] [c008bf14] (tick_sched_timer+0x80/0xec) from [c006840c] 
 (__run_hrtimer+0x7c/0x1e0)
 [  906.055419] [c006840c] (__run_hrtimer+0x7c/0x1e0) from [c00691f0] 
 (hrtimer_interrupt+0x11c/0x2d0)
 [  906.060424] [c00691f0] (hrtimer_interrupt+0x11c/0x2d0) from [c001a04c] 
 (twd_handler+0x30/0x44)
 [  906.065307] [c001a04c] (twd_handler+0x30/0x44) from [c00a7068] 
 (handle_percpu_devid_irq+0x90/0x13c)
 [  906.070434] [c00a7068] (handle_percpu_devid_irq+0x90/0x13c) from 
 [c00a37dc] (generic_handle_irq+0x30/0x48)
 [  906.075897] [c00a37dc] (generic_handle_irq+0x30/0x48) from [c0014c58] 
 (handle_IRQ+0x4c/0xac)
 [  906.080688] [c0014c58] (handle_IRQ+0x4c/0xac) from [c0008478] 
 (gic_handle_irq+0x28/0x5c)
 [  906.085296] [c0008478] (gic_handle_irq+0x28/0x5c) from [c04f8ca4] 
 (__irq_svc+0x44/0x5c)
 [  906.089843] Exception stack(0xde86ff88 to 0xde86ffd0)
 [  906.092590] ff80:   0003cb06 0001  de8660c0 
 de86e000

Re: rcu self-detected stall messages on OMAP3, 4 boards

2012-09-12 Thread Paul E. McKenney
On Wed, Sep 12, 2012 at 10:51:30PM +, Paul Walmsley wrote:
 
 Hi Paul
 
 Recently several of us have been seeing INFO: rcu_sched self-detected
 stall on CPU { 0} (t=20611 jiffies) stack tracebacks on various OMAP3
 and 4 board.
 
 I only noticed it during v3.6-rc3, but I suspect it's been happening 
 for users at least since May:
 
 http://www.mail-archive.com/linux-omap@vger.kernel.org/msg68942.html
 
 
 The only quasi-reproducible test case that I've found so far 
 is to boot the board with serial console enabled to a login prompt, then 
 wait for a few minutes, then send a keypress to the board via serial.
 The tracebacks I get look like this:

Interesting.  I am assuming that the interrupt in the stack below came
from idle, if not, please let me know what.

Could you please reproduce with CONFIG_RCU_CPU_STALL_INFO=y?  That would
give me a bit more information about why RCU thought that there was
a stall.  (CCing Becky Bruce, who saw something similar recently.)

Subodh Nijsure (also CCed) reported something that might be similar on
ARM, and also reported that setting the following got rid of the stalls:

CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y

At which point he was happy, which was good, but which also left the
underlying problem unsolved.  Do these affect your system?  If so,
do they cause a different ARM idle loop to be executed?

Thanx, Paul

 [  467.480712] INFO: rcu_sched self-detected stall on CPU { 0}  (t=20611 
 jiffies)
 [  467.484741] [c001b7cc] (unwind_backtrace+0x0/0xf0) from [c00acc94] 
 (rcu_check_callbacks+0x180/0x630)
 [  467.489929] [c00acc94] (rcu_check_callbacks+0x180/0x630) from 
 [c0052b18] (update_process_times+0x38/0x68)
 [  467.495361] [c0052b18] (update_process_times+0x38/0x68) from 
 [c008c04c] (tick_sched_timer+0x80/0xec)
 [  467.500518] [c008c04c] (tick_sched_timer+0x80/0xec) from [c0068544] 
 (__run_hrtimer+0x7c/0x1e0)
 [  467.505401] [c0068544] (__run_hrtimer+0x7c/0x1e0) from [c0069328] 
 (hrtimer_interrupt+0x11c/0x2d0)
 [  467.510437] [c0069328] (hrtimer_interrupt+0x11c/0x2d0) from [c001a04c] 
 (twd_handler+0x30/0x44)
 [  467.515350] [c001a04c] (twd_handler+0x30/0x44) from [c00a71a0] 
 (handle_percpu_devid_irq+0x90/0x13c)
 [  467.520477] [c00a71a0] (handle_percpu_devid_irq+0x90/0x13c) from 
 [c00a3914] (generic_handle_irq+0x30/0x48)
 [  467.525939] [c00a3914] (generic_handle_irq+0x30/0x48) from [c0014c58] 
 (handle_IRQ+0x4c/0xac)
 [  467.530731] [c0014c58] (handle_IRQ+0x4c/0xac) from [c0008478] 
 (gic_handle_irq+0x28/0x5c)
 [  467.535339] [c0008478] (gic_handle_irq+0x28/0x5c) from [c04f8ce4] 
 (__irq_svc+0x44/0x5c)
 [  467.539886] Exception stack(0xc0729f58 to 0xc0729fa0)
 [  467.542663] 9f40:   
 00047f2a 0001
 [  467.547119] 9f60:  c074a940 c0728000 c07c4b08 c05045a0 c074be20 
  411fc092
 [  467.551574] 9f80: c074c040  0001 c0729fa0 00047f2b c0014f50 
 2113 
 [  467.556030] [c04f8ce4] (__irq_svc+0x44/0x5c) from [c0014f50] 
 (default_idle+0x20/0x44)
 [  467.560485] [c0014f50] (default_idle+0x20/0x44) from [c001517c] 
 (cpu_idle+0x9c/0x114)
 [  467.564971] [c001517c] (cpu_idle+0x9c/0x114) from [c06d77b0] 
 (start_kernel+0x2b4/0x304)
 
 Looks like this message was added as of commit
 a858af2875fb291d0f4b0a4419fefbf03c2379c0 (rcu: Print scheduling-clock
 information on RCU CPU stall-warning messages).
 
 Do you have any suggestions for how we can determine what is causing
 this?
 
 Here's an example of a kernel config that we use:
 
 http://www.pwsan.com/omap/testlogs/am33xx_hwmod_clock_devel_3.7/20120912092510/build/omap2plus_defconfig/Kconfig
 
 A few observations that may or may not be relevant: we use NO_HZ, and 
 we also have a clockevents timer that is relatively slow to program.
 
 
 regards,
 
 - Paul
 

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] omap changes for v2.6.39 merge window

2011-03-30 Thread Paul E. McKenney
On Wed, Mar 30, 2011 at 02:54:35PM -0700, Tony Lindgren wrote:
 * Thomas Gleixner t...@linutronix.de [110330 14:07]:
  
  So one person will be not enough, that needs to be a whole team of
  experienced people in the very near future to deal with the massive
  tsunami of crap which is targeted at mainline. If we fail to set that
  up, then we run into a very ugly maintainability issue in no time.
 
 One thing that will help here and distribute the load is to move
 more things under drivers/ as then we have more maintainers looking
 at the code.

In many cases, the ARM SoC vendors will want their people producing the
code, so although moving things to drivers might be a good thing to do,
it won't really increase the number of people involved.  Plus the move
to the drivers subtree would be a problem for devices with tight ties
to the board or SoC.

There is work on pushing towards common code, but there is a lot of code
and this will take time and a lot of work.

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] omap changes for v2.6.39 merge window

2011-03-30 Thread Paul E. McKenney
On Wed, Mar 30, 2011 at 03:47:52PM -0700, Tony Lindgren wrote:
 * Paul E. McKenney paul...@linux.vnet.ibm.com [110330 15:35]:
  On Wed, Mar 30, 2011 at 02:54:35PM -0700, Tony Lindgren wrote:
   * Thomas Gleixner t...@linutronix.de [110330 14:07]:

So one person will be not enough, that needs to be a whole team of
experienced people in the very near future to deal with the massive
tsunami of crap which is targeted at mainline. If we fail to set that
up, then we run into a very ugly maintainability issue in no time.
   
   One thing that will help here and distribute the load is to move
   more things under drivers/ as then we have more maintainers looking
   at the code.
  
  In many cases, the ARM SoC vendors will want their people producing the
  code, so although moving things to drivers might be a good thing to do,
  it won't really increase the number of people involved.  Plus the move
  to the drivers subtree would be a problem for devices with tight ties
  to the board or SoC.
  
  There is work on pushing towards common code, but there is a lot of code
  and this will take time and a lot of work.
 
 I agree on the common code part, then even drivers with tight
 ties to board or SoC become just generic drivers that are easy
 to review.

Yep!  The trick is getting to that point.  Some drivers will be easier
than others.

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] power: introduce library for device-specific OPPs

2010-09-24 Thread Paul E. McKenney
On Fri, Sep 24, 2010 at 07:50:40AM -0500, Nishanth Menon wrote:
 SoCs have a standard set of tuples consisting of frequency and
 voltage pairs that the device will support per voltage domain. These
 are called Operating Performance Points or OPPs. The actual
 definitions of OPP varies over silicon versions. For a specific domain,
 we can have a set of {frequency, voltage} pairs. As the kernel boots
 and more information is available, a default set of these are activated
 based on the precise nature of device. Further on operation, based on
 conditions prevailing in the system (such as temperature), some OPP
 availability may be temporarily controlled by the SoC frameworks.
 
 To implement an OPP, some sort of power management support is necessary
 hence this library depends on CONFIG_PM.
 
 Contributions include:
 Sanjeev Premi for the initial concept:
   http://patchwork.kernel.org/patch/50998/
 Kevin Hilman for converting original design to device-based
 Kevin Hilman and Paul Walmsey for cleaning up many of the function
 abstractions, improvements and data structure handling
 Romit Dasgupta for using enums instead of opp pointers
 Thara Gopinath, Eduardo Valentin and Vishwanath BS for fixes and
 cleanups.
 Linus Walleij for recommending this layer be made generic for usage
 in other architectures beyond OMAP and ARM.
 Mark Brown, Andrew Morton, Rafael J Wysocki for valuable improvements.
 
 Discussions and comments from:
 http://marc.info/?l=linux-omapm=126033945313269w=2
 http://marc.info/?l=linux-omapm=125482970102327w=2
 http://marc.info/?t=12580924752r=1w=2
 http://marc.info/?l=linux-omapm=126025973426007w=2
 http://marc.info/?t=128152609200064r=1w=2
 http://marc.info/?t=12846872302r=1w=2
 http://marc.info/?t=12851921862r=1w=2
 incorporated.

Looks like a good start!!!  Some questions and suggestions about RCU
usage interspersed below.

Thanx, Paul

 Cc: Benoit Cousson b-cous...@ti.com
 Cc: Madhusudhan Chikkature Rajashekar madhu...@ti.com
 Cc: Phil Carmody ext-phil.2.carm...@nokia.com
 Cc: Roberto Granados Dorado x0095...@ti.com
 Cc: Santosh Shilimkar santosh.shilim...@ti.com
 Cc: Sergio Alberto Aguirre Rodriguez saagui...@ti.com
 Cc: Tero Kristo tero.kri...@nokia.com
 Cc: Eduardo Valentin eduardo.valen...@nokia.com
 Cc: Paul Walmsley p...@pwsan.com
 Cc: Sanjeev Premi pr...@ti.com
 Cc: Thara Gopinath th...@ti.com
 Cc: Vishwanath BS vishwanath...@ti.com
 Cc: Linus Walleij linus.wall...@stericsson.com
 Cc: Mark Brown broo...@opensource.wolfsonmicro.com
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Rafael J. Wysocki r...@sisk.pl
 
 Signed-off-by: Nishanth Menon n...@ti.com
 Signed-off-by: Kevin Hilman khil...@deeprootsystems.com
 ---
 V4:
   Cc list trimmed to just the MLs.
   RCU implementation - thanks for the pointers Rafael
   re-tested on OMAP3630 with lock debugging enabled including cpufreq,
   idle paths and exception cases - enable/disable.
   available_opp_count removed as it complicated the updates altogether
+ the usage latencies are not too high given small lists.
   In removed list_reverse usage as rcu has no list reverse equivalents
+ searching either direction has no real difference when the list
is ordered.
   change in protos for get_{voltage,frequency} - const removed from
   parameter as rcu_dereference causes build warnings
   introduced opp_set_availability helper to wrap common code between
opp_{enable,disable} - leaving these as is as they are more readable
from usage perspective + these are not meant to be used in hot paths
anyways.
   Header guard __ASM_OPP_H replaced with __LINUX_OPP_H__ for opp.h
 
 V3:
  https://patchwork.kernel.org/patch/200382/
   Bunch of documentation update based on feedback
   removed opp_def - opp_add enables the opp by default
   lock usage optimization
 
 Sample usage:
  http://pastebin.mozilla.org/794786 - cleanups for OMAP power
  http://pastebin.mozilla.org/794787 - Sample OPP initialization for OMAP
 
 V2:
  https://patchwork.kernel.org/patch/189532/
   Incorporated review comments from v1. major changes being:
   $subject change to reflect this is for power.
   Documentation revamp and including it in the patch :)
   OPP_DEF removed - lets introduce this if needed or leave it to
   SOC frameworks to organize code as needed.
   Rename of enable to available to better reflect the intent
   few fixes and typos
   Introduced mutex based locking for controlling access to list
   modification (note: query functions are still unsafe- rationale
in the patch below)
   A new home for opp.c in drivers/base/power (moved from lib/)
   A few optimization in function flow and additional error checks
   added to exposed functions
   offline aligned with Kevin for cleaning up the copyrights
 
 V1:
  http://marc.info/?t=12846872302r=1w=2
 
  Documentation/power/00-INDEX |2 +
  Documentation/power/opp.txt  |  326 
  

Re: [PATCH v4] power: introduce library for device-specific OPPs

2010-09-24 Thread Paul E. McKenney
On Fri, Sep 24, 2010 at 04:26:21PM -0500, Nishanth Menon wrote:
 Paul E. McKenney had written, on 09/24/2010 02:37 PM, the following:
 [...]
 
 Looks like a good start!!!  Some questions and suggestions about RCU
 Thanks for the review.. few comments below..

Back at you!  ;-)

 usage interspersed below.
 
 Thanx, Paul
 [...]
 +
 +/**
 + * find_device_opp() - find device_opp struct using device pointer
 + * @dev: device pointer used to lookup device OPPs
 + *
 + * Search list of device OPPs for one containing matching device. Does a 
 RCU
 + * reader operation to grab the pointer needed.
 + *
 + * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or
 + * -EINVAL based on type of error.
 + */
 +static struct device_opp *find_device_opp(struct device *dev)
 +{
 + struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV);
 +
 + if (unlikely(!dev || IS_ERR(dev))) {
 + pr_err(%s: Invalid parameters being passed\n, __func__);
 + return ERR_PTR(-EINVAL);
 + }
 +
 + rcu_read_lock();
 + list_for_each_entry_rcu(tmp_dev_opp, dev_opp_list, node) {
 + if (tmp_dev_opp-dev == dev) {
 + dev_opp = tmp_dev_opp;
 + break;
 + }
 + }
 + rcu_read_unlock();
 
 What prevents the structure pointed to by dev_opp from being freed
 at this point?  We are no longer in an RCU read-side critical section,
 so RCU grace periods starting during the above RCU read-side critical
 section can now end.
 
 dev_opp is never freed in the implementation - it represents
 domains, only adds with list_add_rcu() is done - wont the usage be
 safe then? or I being blind?
 
 the reason why we dont free is coz of the following: dev_opp
 represents voltage domains in opp library. SoC frameworks are
 required to register only those voltage domain opp that are
 required. by allowing a free logic, I knew it'd have complicated the
 implementation way beyond what we needed it to be.

Perhaps I was confusing two different data structures, if so, apologies.

So you are freeing the opp level, but never the dev_opp level, then?

But yes, if you are only adding and never deleting, then it is safe to
pass the pointers out of an RCU read-side critical section.  But please
add a comment saying why you are doing this.  Otherwise, Coccinelle will
cause me to continue complaining about this to you.  ;-)

And the later uses still look buggy to me, please see below.

 Here is an example sequence of events that I am worried about:
 
 o   CPU 1 enters find_device_opp(), and pick up a pointer to
 a given device opp.
 
 o   CPU 2 executes opp_set_availability(), replacing that same
 device opp with a new one.  It then calls synchronize_rcu()
 which blocks waiting for CPU 1 to exit its RCU read-side
 critical section.
 
 o   CPU 1 exits its RCU read-side critical section, arriving at
 this point in the code.
 
 o   CPU 2's synchronize_rcu() is now permitted to return, executing
 the kfree(), which frees up the memory that CPU 1's dev_opp
 pointer references.
 
 o   This newly freed memory is allocated for some other structure
 by CPU 3.  CPU 1 and CPU 3 are now trying to use the same
 memory for two different structures, and nothing good can
 possibly come of this.  The kernel dies a brutal and nasty
 death.
 
 One way to fix this is to have the caller do rcu_read_lock() before
 calling find_device_opp(), and to do rcu_read_unlock() only after the
 caller has finished using the pointer that find_device_opp() returns.
 This works well unless the caller needs to do some blocking operation
 before it gets done using the pointer.
 
 Another approach is for find_device_opp() to use a reference count on
 the structure, and for opp_set_availability() to avoid freeing the
 structure unless/until the reference counter drops to zero.
 
 There are other approaches as well, please feel free to take a look
 at Documentation/RCU/rcuref.txt for more info on using reference
 counting and RCU.
 thx. I probably should read yet again if I got my understanding of
 usage right..
 
 
 [...]
 +
 +/**
 + * opp_find_freq_exact() - search for an exact frequency
 + * @dev: device for which we do this operation
 + * @freq:frequency to search for
 + * @is_available:true/false - match for available opp
 + *
 + * Searches for exact match in the opp list and returns pointer to the 
 matching
 + * opp if found, else returns ERR_PTR in case of error and should be 
 handled
 + * using IS_ERR.
 + *
 + * Note: available is a modifier for the search. if available=true, then 
 the
 + * match is for exact matching frequency and is available in the stored OPP
 + * table. if false, the match is for exact frequency which is not 
 available.
 + *
 + * This provides a mechanism to enable an opp

Re: [RFC][PATCH 0/1] kmemleak: Fix false positive with alias

2010-08-30 Thread Paul E. McKenney
On Fri, Aug 27, 2010 at 09:12:24AM +0300, Hiroshi DOYU wrote:
 Hi Catalin,
 
 From: Hiroshi DOYU hiroshi.d...@nokia.com
 Subject: Re: [RFC][PATCH 0/1] kmemleak: Fix false positive with alias
 Date: Tue, 10 Aug 2010 18:49:03 +0300 (EEST)
 
 snip
  The performance impact is indeed pretty high, though some parts of the
  code look over-engineered to me (the __scan_block function with a loop
  going through an array of two function pointers - I think the compiler
  cannot figure out what to inline). You could just extend the
  find_and_get_object() to search both trees under a single spinlock
  region (as locking also takes time).
  
  Ok, a good point.
  
  Now there's not much difference with the attached patch, a new version
  of alias.
  
  / # modprobe kmemleak-special-test use_alias=0
  / # time echo scan  /sys/kernel/debug/kmemleak
  real0m 2.30s
  user0m 0.00s
  sys 0m 2.30s
  
  / # modprobe kmemleak-special-test use_alias=1
  / # time echo scan  /sys/kernel/debug/kmemleak
  real0m 3.91s
  user0m 0.00s
  sys 0m 3.91s
 
 It would be nice if you could have some time to take a look at this
 patch and give some comments.
 
  From a5670d69b2cafe85f6f26f6951097210d3b9917f Mon Sep 17 00:00:00 2001
  From: Hiroshi DOYU hiroshi.d...@nokia.com
  Date: Thu, 17 Jun 2010 13:36:45 +0300
  Subject: [PATCH 1/1] kmemleak: Fix false positive with address alias
  
  There is a false positive case that a pointer is calculated by other
  methods than the usual container_of macro. kmemleak_ignore can cover
  such a false positive, but it would loose the advantage of memory leak
  detection. This patch allows kmemleak to work with such false
  positives by aliasing of address. A client module can register an
  alias address to an original pointer.
  
  A typical use case could be the IOMMU pagetable allocation which
  stores pointers to the second level of page tables with some
  conversion, for example, a physical address with attribute bits. Right
  now I don't have other use cases but I hope that there could be some
  that this special scan works with.

A few questions below...

Thanx, Paul

  Signed-off-by: Hiroshi DOYU hiroshi.d...@nokia.com
  Cc: Phil Carmody ext-phil.2.carm...@nokia.com
  ---
   include/linux/kmemleak.h |8 ++
   mm/kmemleak.c|  208 
  +++---
   2 files changed, 204 insertions(+), 12 deletions(-)
  
  diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h
  index 99d9a67..9e2af3a 100644
  --- a/include/linux/kmemleak.h
  +++ b/include/linux/kmemleak.h
  @@ -34,6 +34,8 @@ extern void kmemleak_not_leak(const void *ptr) __ref;
   extern void kmemleak_ignore(const void *ptr) __ref;
   extern void kmemleak_scan_area(const void *ptr, size_t size, gfp_t gfp) 
  __ref;
   extern void kmemleak_no_scan(const void *ptr) __ref;
  +extern void kmemleak_add_alias(const void *ptr, const void *new) __ref;
  +extern void kmemleak_unalias(const void *alias) __ref;
   
   static inline void kmemleak_alloc_recursive(const void *ptr, size_t size,
  int min_count, unsigned long flags,
  @@ -92,6 +94,12 @@ static inline void kmemleak_erase(void **ptr)
   static inline void kmemleak_no_scan(const void *ptr)
   {
   }
  +static inline void kmemleak_add_alias(const void *ptr, const void *new)
  +{
  +}
  +static inline void kmemleak_unalias(const void *alias)
  +{
  +}
   
   #endif /* CONFIG_DEBUG_KMEMLEAK */
   
  diff --git a/mm/kmemleak.c b/mm/kmemleak.c
  index 2c0d032..3875cb7 100644
  --- a/mm/kmemleak.c
  +++ b/mm/kmemleak.c
  @@ -157,6 +157,13 @@ struct kmemleak_object {
  unsigned long jiffies;  /* creation timestamp */
  pid_t pid;  /* pid of the current task */
  char comm[TASK_COMM_LEN];   /* executable name */
  +   struct kmemleak_alias *alias;   /* if a pointer is modified */
  +};
  +
  +struct kmemleak_alias {
  +   struct list_head alias_list;
  +   struct prio_tree_node tree_node;
  +   struct kmemleak_object *object;
   };
   
   /* flag representing the memory block allocation status */
  @@ -179,13 +186,18 @@ struct kmemleak_object {
   static LIST_HEAD(object_list);
   /* the list of gray-colored objects (see color_gray comment below) */
   static LIST_HEAD(gray_list);
  +/* the list of objects with alias (see alias comment below) */
  +static LIST_HEAD(alias_list);
   /* prio search tree for object boundaries */
   static struct prio_tree_root object_tree_root;
  +/* prio search tree for alias object boundaries */
  +static struct prio_tree_root alias_tree_root;
   /* rw_lock protecting the access to object_list and prio_tree_root */
   static DEFINE_RWLOCK(kmemleak_lock);
   
   /* allocation caches for kmemleak internal data */
   static struct kmem_cache *object_cache;
  +static struct kmem_cache *alias_cache;
   static struct kmem_cache *scan_area_cache;
   
   /* set