Re: [PATCH v2] security: device_cgroup: fix RCU lockdep splat
On Thu, Sep 03, 2015 at 08:48:39AM -0500, Serge E. Hallyn wrote: > On Wed, Sep 02, 2015 at 05:14:33PM -0700, Paul E. McKenney wrote: > > On Wed, Sep 02, 2015 at 12:24:50PM -0400, Tejun Heo wrote: > > > cc'ing Paul. > > > > > > On Wed, Sep 02, 2015 at 08:12:28AM -0500, Felipe Balbi wrote: > > > > while booting AM437x device, the following splat > > > > triggered: > > > > > > > > [ 12.005238] === > > > > [ 12.009749] [ INFO: suspicious RCU usage. ] > > > > [ 12.014116] 4.2.0-next-20150831 #1154 Not tainted > > > > [ 12.019050] --- > > > > [ 12.023408] security/device_cgroup.c:405 device_cgroup:verify_new_ex > > > > called without proper synchronization! > > > ... > > > > [ 12.128326] [] (verify_new_ex) from [] > > > > (devcgroup_access_write+0x374/0x658) > > > > [ 12.137426] [] (devcgroup_access_write) from [] > > > > (cgroup_file_write+0x28/0x1bc) > > > > [ 12.146796] [] (cgroup_file_write) from [] > > > > (kernfs_fop_write+0xc0/0x1b8) > > > > [ 12.155620] [] (kernfs_fop_write) from [] > > > > (__vfs_write+0x1c/0xd8) > > > > [ 12.163783] [] (__vfs_write) from [] > > > > (vfs_write+0x90/0x16c) > > > > [ 12.171426] [] (vfs_write) from [] > > > > (SyS_write+0x44/0x9c) > > > > [ 12.178806] [] (SyS_write) from [] > > > > (ret_fast_syscall+0x0/0x1c) > > > > > > This shouldn't be happening because devcgroup_access_write() always > > > grabs devcgroup_mutex. Looking at the log, the culprit seems to be > > > f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to > > > RCU_LOCKDEP_WARN()"). It missed the bang for the second test while > > > inverting it, so adding rcu_read_lock() isn't the right fix here. > > > > > > Paul, can you please fix it? > > > > Gah! Please see below. > > > > Thanx, Paul > > > > > > > > security/device_cgroup: Fix RCU_LOCKDEP_WARN() condition > > > > f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()") > > introduced a bug by incorrectly inverting the condition when moving from > > rcu_lockdep_assert() to RCU_LOCKDEP_WARN(). This commit therefore fixes > > the inversion. > > > > Reported-by: Felipe Balbi <ba...@ti.com> > > Reported-by: Tejun Heo <t...@kernel.org> > > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > > Cc: Serge Hallyn <se...@hallyn.com> > > Oh, makes sense :) (didn't see the original patch when it came by, sorry) I should have CCed you, apologies for failing to do so. > Acked-by: Serge Hallyn <serge.hal...@canonical.com> Added, thank you! Thanx, Paul > > diff --git a/security/device_cgroup.c b/security/device_cgroup.c > > index 73455089feef..03c1652c9a1f 100644 > > --- a/security/device_cgroup.c > > +++ b/security/device_cgroup.c > > @@ -401,7 +401,7 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup, > > bool match = false; > > > > RCU_LOCKDEP_WARN(!rcu_read_lock_held() && > > -lockdep_is_held(_mutex), > > +!lockdep_is_held(_mutex), > > "device_cgroup:verify_new_ex called without proper > > synchronization"); > > > > if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) { > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] security: device_cgroup: fix RCU lockdep splat
On Thu, Sep 03, 2015 at 09:56:05AM -0400, Josh Boyer wrote: > On Wed, Sep 2, 2015 at 8:14 PM, Paul E. McKenney > <paul...@linux.vnet.ibm.com> wrote: > > On Wed, Sep 02, 2015 at 12:24:50PM -0400, Tejun Heo wrote: > >> cc'ing Paul. > >> > >> On Wed, Sep 02, 2015 at 08:12:28AM -0500, Felipe Balbi wrote: > >> > while booting AM437x device, the following splat > >> > triggered: > >> > > >> > [ 12.005238] === > >> > [ 12.009749] [ INFO: suspicious RCU usage. ] > >> > [ 12.014116] 4.2.0-next-20150831 #1154 Not tainted > >> > [ 12.019050] --- > >> > [ 12.023408] security/device_cgroup.c:405 device_cgroup:verify_new_ex > >> > called without proper synchronization! > >> ... > >> > [ 12.128326] [] (verify_new_ex) from [] > >> > (devcgroup_access_write+0x374/0x658) > >> > [ 12.137426] [] (devcgroup_access_write) from [] > >> > (cgroup_file_write+0x28/0x1bc) > >> > [ 12.146796] [] (cgroup_file_write) from [] > >> > (kernfs_fop_write+0xc0/0x1b8) > >> > [ 12.155620] [] (kernfs_fop_write) from [] > >> > (__vfs_write+0x1c/0xd8) > >> > [ 12.163783] [] (__vfs_write) from [] > >> > (vfs_write+0x90/0x16c) > >> > [ 12.171426] [] (vfs_write) from [] > >> > (SyS_write+0x44/0x9c) > >> > [ 12.178806] [] (SyS_write) from [] > >> > (ret_fast_syscall+0x0/0x1c) > >> > >> This shouldn't be happening because devcgroup_access_write() always > >> grabs devcgroup_mutex. Looking at the log, the culprit seems to be > >> f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to > >> RCU_LOCKDEP_WARN()"). It missed the bang for the second test while > >> inverting it, so adding rcu_read_lock() isn't the right fix here. > >> > >> Paul, can you please fix it? > > > > Gah! Please see below. > > > > Thanx, Paul > > > > ---- > > > > security/device_cgroup: Fix RCU_LOCKDEP_WARN() condition > > > > f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()") > > introduced a bug by incorrectly inverting the condition when moving from > > rcu_lockdep_assert() to RCU_LOCKDEP_WARN(). This commit therefore fixes > > the inversion. > > > > Reported-by: Felipe Balbi <ba...@ti.com> > > Reported-by: Tejun Heo <t...@kernel.org> > > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > > Cc: Serge Hallyn <se...@hallyn.com> > > Just tested this patch without Felipe's previous version on all my > machines. The splat is indeed gone. > > Tested-by: Josh Boyer <jwbo...@fedoraproject.org> Thank you, added! Thanx, Paul > josh > > > > > diff --git a/security/device_cgroup.c b/security/device_cgroup.c > > index 73455089feef..03c1652c9a1f 100644 > > --- a/security/device_cgroup.c > > +++ b/security/device_cgroup.c > > @@ -401,7 +401,7 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup, > > bool match = false; > > > > RCU_LOCKDEP_WARN(!rcu_read_lock_held() && > > -lockdep_is_held(_mutex), > > +!lockdep_is_held(_mutex), > > "device_cgroup:verify_new_ex called without proper > > synchronization"); > > > > if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) { > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] security: device_cgroup: fix RCU lockdep splat
On Wed, Sep 02, 2015 at 12:24:50PM -0400, Tejun Heo wrote: > cc'ing Paul. > > On Wed, Sep 02, 2015 at 08:12:28AM -0500, Felipe Balbi wrote: > > while booting AM437x device, the following splat > > triggered: > > > > [ 12.005238] === > > [ 12.009749] [ INFO: suspicious RCU usage. ] > > [ 12.014116] 4.2.0-next-20150831 #1154 Not tainted > > [ 12.019050] --- > > [ 12.023408] security/device_cgroup.c:405 device_cgroup:verify_new_ex > > called without proper synchronization! > ... > > [ 12.128326] [] (verify_new_ex) from [] > > (devcgroup_access_write+0x374/0x658) > > [ 12.137426] [] (devcgroup_access_write) from [] > > (cgroup_file_write+0x28/0x1bc) > > [ 12.146796] [] (cgroup_file_write) from [] > > (kernfs_fop_write+0xc0/0x1b8) > > [ 12.155620] [] (kernfs_fop_write) from [] > > (__vfs_write+0x1c/0xd8) > > [ 12.163783] [] (__vfs_write) from [] > > (vfs_write+0x90/0x16c) > > [ 12.171426] [] (vfs_write) from [] > > (SyS_write+0x44/0x9c) > > [ 12.178806] [] (SyS_write) from [] > > (ret_fast_syscall+0x0/0x1c) > > This shouldn't be happening because devcgroup_access_write() always > grabs devcgroup_mutex. Looking at the log, the culprit seems to be > f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to > RCU_LOCKDEP_WARN()"). It missed the bang for the second test while > inverting it, so adding rcu_read_lock() isn't the right fix here. > > Paul, can you please fix it? Gah! Please see below. Thanx, Paul security/device_cgroup: Fix RCU_LOCKDEP_WARN() condition f78f5b90c4ff ("rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()") introduced a bug by incorrectly inverting the condition when moving from rcu_lockdep_assert() to RCU_LOCKDEP_WARN(). This commit therefore fixes the inversion. Reported-by: Felipe Balbi <ba...@ti.com> Reported-by: Tejun Heo <t...@kernel.org> Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> Cc: Serge Hallyn <se...@hallyn.com> diff --git a/security/device_cgroup.c b/security/device_cgroup.c index 73455089feef..03c1652c9a1f 100644 --- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -401,7 +401,7 @@ static bool verify_new_ex(struct dev_cgroup *dev_cgroup, bool match = false; RCU_LOCKDEP_WARN(!rcu_read_lock_held() && -lockdep_is_held(_mutex), +!lockdep_is_held(_mutex), "device_cgroup:verify_new_ex called without proper synchronization"); if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) { -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap: panda rcu stall on 3.10.1
On Fri, Aug 30, 2013 at 08:27:32AM +0200, Christian Hoffmann wrote: This also went to: linaro-ker...@lists.linaro.org Hi, current linaro kernel on 13.07 (3.10.1.0-1-linaro-omap) shows stack below. System is generally quite unstable. Light usage (postfix+dovecot...) Full dmesg here: https://dl.dropboxusercontent.com/u/21820416/rcu.stall.panda Looks to me like the CPU stall is cascaded problem. Prior to this in dmesg, you have several terminated with status N messages, where N!=0. I believe that fixing the problems leading to these messages would be a good first step. Thanx, Paul Rgds, Chris [ 4538.880432] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=2103 jiffies, g=1649, c=1648, q=18) [ 4538.891723] Backtrace for cpu 0 (current): [ 4538.891723] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.1.0-1-linaro-omap #1ubuntu1~ci+130723003933-Ubuntu [ 4538.891754] [c001175d] (unwind_backtrace+0x1/0x9c) from [c000f625] (show_stack+0x11/0x14) [ 4538.891815] [c000f625] (show_stack+0x11/0x14) from [c00106d5] (smp_send_all_cpu_backtrace+0x49/0x98) [ 4538.891815] [c00106d5] (smp_send_all_cpu_backtrace+0x49/0x98) from [c007c755] (rcu_check_callbacks+0x439/0x4ac) [ 4538.891815] [c007c755] (rcu_check_callbacks+0x439/0x4ac) from [c0036599] (update_process_times+0x2d/0x4c) [ 4538.891876] [c0036599] (update_process_times+0x2d/0x4c) from [c005fcfd] (tick_sched_handle+0x35/0x40) [ 4538.891906] [c005fcfd] (tick_sched_handle+0x35/0x40) from [c005feed] (tick_sched_timer+0x31/0x54) [ 4538.891937] [c005feed] (tick_sched_timer+0x31/0x54) from [c004532b] (__run_hrtimer+0x5b/0x138) [ 4538.891937] [c004532b] (__run_hrtimer+0x5b/0x138) from [c0045bcb] (hrtimer_interrupt+0xdb/0x22c) [ 4538.891967] [c0045bcb] (hrtimer_interrupt+0xdb/0x22c) from [c0010e13] (twd_handler+0x1b/0x28) [ 4538.891998] [c0010e13] (twd_handler+0x1b/0x28) from [c00783b9] (handle_percpu_devid_irq+0x4d/0xac) [ 4538.891998] [c00783b9] (handle_percpu_devid_irq+0x4d/0xac) from [c0075ead] (generic_handle_irq+0x19/0x24) [ 4538.892028] [c0075ead] (generic_handle_irq+0x19/0x24) from [c000d16d] (handle_IRQ+0x29/0x68) [ 4538.892059] [c000d16d] (handle_IRQ+0x29/0x68) from [c00084db] (gic_handle_irq+0x27/0x50) [ 4538.892059] [c00084db] (gic_handle_irq+0x27/0x50) from [c045315b] (__irq_svc+0x3b/0x5c) [ 4538.892059] Exception stack(0xc06e3f68 to 0xc06e3fb0) [ 4538.892059] 3f60: c06e2000 c075f130 [ 4538.892120] 3f80: c06e2000 c075ea93 c075f130 c06ea594 c045c2e0 c06e2000 c0fe5908 c06e3fb0 [ 4538.892120] 3fa0: c007bab1 c0058b48 400f0133 [ 4538.892120] [c045315b] (__irq_svc+0x3b/0x5c) from [c0058b48] (cpu_startup_entry+0xbc/0x164) [ 4538.892120] [c0058b48] (cpu_startup_entry+0xbc/0x164) from [c06735f9] (start_kernel+0x251/0x2ac) [ 4538.892120] [ 4538.892120] sending IPI to all other CPUs: [ 4538.892211] IPI backtrace for cpu 1 [ 4538.892211] [ 4538.892242] CPU: 1 PID: 2064 Comm: clamd Not tainted 3.10.1.0-1-linaro-omap #1ubuntu1~ci+130723003933-Ubuntu [ 4538.892242] task: edb09500 ti: ec91e000 task.ti: ec91e000 [ 4538.892272] PC is at 0xb6e2f0d4 [ 4538.892272] LR is at 0xb6eeec01 [ 4538.892272] pc : [b6e2f0d4]lr : [b6eeec01]psr: 000f0130 [ 4538.892272] sp : be8b472c ip : 62323337 fp : 684a [ 4538.892303] r10: 0084d750 r9 : 0200 r8 : e86d6a2f [ 4538.892303] r7 : 0a0a0a0a r6 : 69393e32 r5 : 0080bf24 r4 : 0080bf28 [ 4538.892333] r3 : 81010100 r2 : 6438 r1 : 000a r0 : 0080bf24 [ 4538.892333] Flags: nzcv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user [ 4538.892333] Control: 50c5387d Table: acbd404a DAC: [ 4538.892364] [ 4538.892364] R8: 0xe86d69af: [ 4538.892364] 69ac 6a6f7254 412e6e61 746e6567 3939332d 00303934 b1aa3270 2506 47a83ca7 [ 4538.892395] 69cc e4ebc453 91ad979e 23363f8b fd49d9d0 773b48ba ad58f7d3 04c351eb 96e4414f [ 4538.892395] 69ec 7e1a3ac7 f39072a5 e362cedb b119b3e0 b595375e b2137762 b19f99a8 b10dc638 [ 4538.892395] 6a0c 1806 2e6e6957 6a6f7254 412e6e61 746e6567 3939332d 00303534 b23ade28 [ 4538.892395] 6a2c 1806 2e6e6957 6a6f7254 462e6e61 61656b61 37342d76 00373131 ac58376e [ 4538.892395] 6a4c 72542e6e 6e616a6f 3434382e 35313336 3e55c600 412e6eb2 72617764 61472e65 [ 4538.892517] 6a6c 6176656d 2d65636e 34393132 55b20037 1604b173 2e6e6957 6a6f7254 462e6e61 [ 4538.892547] 6a8c 61656b61 37342d76 00343131 b652e2c0 72542e6e 6e616a6f 7075532e 756a7265 [ 4538.892547] 6aac 322d6e61 fb003731 15b23a2b 2e6e6957 6a6f7254 412e6e61 746e6567 3939332d [ 4538.892547] CPU: 1 PID: 2064 Comm: clamd Not tainted 3.10.1.0-1-linaro-omap #1ubuntu1~ci+130723003933-Ubuntu [ 4538.892547] [c001175d] (unwind_backtrace+0x1/0x9c) from [c000f625] (show_stack+0x11/0x14) [ 4538.892639] [c000f625] (show_stack+0x11/0x14) from [c0010819] (handle_IPI+0xd5/0x158) [ 4538.892669] [c0010819]
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Mon, Oct 01, 2012 at 10:55:11AM +0200, Linus Walleij wrote: On Sat, Sep 22, 2012 at 11:59 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: rcu: Fix day-one dyntick-idle stall-warning bug As mentioned in another thread this solves the same problem for ux500. Reported/Tested-by: Linus Walleij linus.wall...@linaro.org But now it appears that this commit didn't make it into v3.6 so it definately needs to be tagged with Cc: sta...@kernel.org before it gets merged since the stall warnings are kinda scary. Ingo submitting this to Linus Torvalds earlier today, so we should be able to send to stable shortly. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Mon, Sep 24, 2012 at 03:11:34PM +0530, Shilimkar, Santosh wrote: On Sun, Sep 23, 2012 at 3:29 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: On Sat, Sep 22, 2012 at 01:10:43PM -0700, Paul E. McKenney wrote: On Sat, Sep 22, 2012 at 06:42:08PM +, Paul Walmsley wrote: On Fri, 21 Sep 2012, Paul E. McKenney wrote: [...] And here is a patch. I am still having trouble reproducing the problem, but figured that I should avoid serializing things. Thanx, Paul b/kernel/rcutree.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) rcu: Fix day-one dyntick-idle stall-warning bug Each grace period is supposed to have at least one callback waiting for that grace period to complete. However, if CONFIG_NO_HZ=n, an extra callback-free grace period is no big problem -- it will chew up a tiny bit of CPU time, but it will complete normally. In contrast, CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to sleep indefinitely, in turn indefinitely delaying completion of the callback-free grace period. Given that nothing is waiting on this grace period, this is also not a problem. Unless RCU CPU stall warnings are also enabled, as they are in recent kernels. In this case, if a CPU wakes up after at least one minute of inactivity, an RCU CPU stall warning will result. The reason that no one noticed until quite recently is that most systems have enough OS noise that they will never remain absolutely idle for a full minute. But there are some embedded systems with cut-down userspace configurations that get into this mode quite easily. All this begs the question of exactly how a callback-free grace period gets started in the first place. This can happen due to the fact that CPUs do not necessarily agree on which grace period is in progress. If a CPU still believes that the grace period that just completed is still ongoing, it will believe that it has callbacks that need to wait for another grace period, never mind the fact that the grace period that they were waiting for just completed. This CPU can therefore erroneously decide to start a new grace period. Once this CPU notices that the earlier grace period completed, it will invoke its callbacks. It then won't have any callbacks left. If no other CPU has any callbacks, we now have a callback-free grace period. This commit therefore makes CPUs check more carefully before starting a new grace period. This new check relies on an array of tail pointers into each CPU's list of callbacks. If the CPU is up to date on which grace periods have completed, it checks to see if any callbacks follow the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks follow the RCU_WAIT_TAIL segment. The reason that this works is that the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment as soon as the CPU figures out that the old grace period has ended. This change is to cpu_needs_another_gp(), which is called in a number of places. The only one that really matters is in rcu_start_gp(), where the root rcu_node structure's -lock is held, which prevents any other CPU from starting or completing a grace period, so that the comparison that determines whether the CPU is missing the completion of a grace period is stable. Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com As already confirmed by Paul W and others, I too no longer see the rcu dumps any more with above patch. Thanks a lot for the fix. Glad it finally works! Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Mon, Sep 24, 2012 at 09:54:00PM +, Paul Walmsley wrote: On Sat, 22 Sep 2012, Paul E. McKenney wrote: On Sat, Sep 22, 2012 at 10:20:19PM +, Paul Walmsley wrote: On Sat, 22 Sep 2012, Paul E. McKenney wrote: This thing has been in the kernel since about 2004, not sure why you didn't hit it earlier. One other data point in that regard - noticed the warnings don't appear when the board is booted with: commit 4fa3b6cb1bc8c14b81b4c8ffdfd3f2500a7e9367 Author: Paul E. McKenney paul.mcken...@linaro.org Date: Tue Jun 5 15:53:53 2012 -0700 rcu: Fix qlen_lazy breakage You lost me on this one. This is already in mainline, so if you were using (say) 3.6-rc6, you would already have this commit applied. If I check out a kernel at this commit 4fa3b6cb1bc8c14b81b4c8ffdfd3f2500a7e9367 + the zero-callback-in-tickless-idle diagnostic patch, build and boot it, then the stall warnings don't appear (in 25 minutes of testing). Messages from the diagnostic patch indicate that the kernel is entering idle during a grace period with no RCU callbacks, though. This is not a big deal and does not need any further attention. Just wanted to place a time boundary on the point when these messages started appearing. (It is unlikely to be an optimal bound: i.e., there are probably later commits where the warnings also don't appear.) Ah, got it, thank you! Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sun, Sep 23, 2012 at 07:55:50AM +, Paul Walmsley wrote: On Sat, 22 Sep 2012, Paul E. McKenney wrote: On Sat, Sep 22, 2012 at 10:25:59PM +, Paul Walmsley wrote: The recent tests here have been on Pandaboard, which is dual-CPU, but my recollection is that I also observed the warnings on a single-core Beagleboard. Will re-test. Anxiously awaiting the results. The same problem exists on BeagleBoard XM (OMAP3730, single-core Cortex-A8): http://www.pwsan.com/omap/transcripts/20120922-beaglexm-rcu-stall-debug-pre-fix.txt and the same patch fixes it: http://www.pwsan.com/omap/transcripts/20120922-beaglexm-rcu-stall-debug-post-fix.txt Please feel free to update my Tested-by:, if you wish. Tested-by: Paul Walmsley p...@pwsan.com # OMAP3730, OMAP4430 Very good, thank you very much!!! Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sat, Sep 22, 2012 at 05:45:12PM +0200, Frederic Weisbecker wrote: 2012/9/22 Paul E. McKenney paul...@linux.vnet.ibm.com: On Fri, Sep 21, 2012 at 01:31:49PM -0700, Tony Lindgren wrote: * Paul E. McKenney paul...@linux.vnet.ibm.com [120921 12:58]: Just to make sure I understand the combinations: o All stalls have happened when running a minimal userspace. o CONFIG_NO_HZ=n suppresses the stalls. o CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has no observable effect on the stalls. The reason why you may need minimal userspace is to cut down the number of timers waking up the system with NO_HZ. Booting with init=/bin/sh might also do the trick for that. Good point! This does make for a very quiet system, but does not reproduce the problem under kvm, even after waiting for four minutes. I will leave it for more time, but it looks like I really might need to ask Linaro for remote access to a Panda. I have one. I'm currently installing Ubuntu on it and I'll try to manage to build a kernel and reproduce the issue. I'll give more news soon. Thank you! My bet is that you have to have a userspace that is so small that it registers only a few (but at least one!) RCU callback at boot time, then never registers any callbacks ever again. I have coded up a crude test case, using Tony Lindgren's suggestion of init=/bin/sh, but I appear to have inadvertently fixed this bug in current -rcu (git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git, branch rcu/next). But I have been wrong a few times already on this particular bug... Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sat, Sep 22, 2012 at 06:42:08PM +, Paul Walmsley wrote: On Fri, 21 Sep 2012, Paul E. McKenney wrote: Could you please point me to a recipe for creating a minimal userspace? Just in case it is the userspac erather than the architecture/hardware that makes the difference. Tony's suggestion is pretty good. Note that there may also be differences in kernel timers -- either between x86 and ARM architectures, or loaded device drivers -- that may confound the problem. For example, there must be at least one RCU callback outstanding after the boot sequence quiets down. Of course, the last time I tried Tony's approach, I was doing it on top of my -rcu stack, so am retrying on v3.6-rc6. Just to make sure I understand the combinations: o All stalls have happened when running a minimal userspace. o CONFIG_NO_HZ=n suppresses the stalls. o CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has no observable effect on the stalls. Did I get that right, or am I missing a combination? That's correct. Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is cancelled upon exit from idle, and therefore should (almost) never actually execute. Its sole purpose is to wake up the CPU. ;-) Right. Just curious, what would wake up the kernel from idle to handle a grace period expiration when CONFIG_RCU_FAST_NO_HZ=n? On a very idle system, the time between timer ticks could potentially be several tens of seconds. If CONFIG_RCU_FAST_NO_HZ=n, then CPUs with RCU callbacks are not permitted to shut off the scheduling-clock tick, so any CPU with RCU callbacks will be awakened every jiffy. The problem is that there appears to be a way to get an RCU grace period started without any CPU having any callbacks, which, as you surmise, would result in all the CPUs going to sleep and the grace period never ending. So if a CPU is awakened for any reason after this everlasting grace period has extended for more than a minute, the first thing that CPU will do is print an RCU CPU stall warning. I believe that I see how to prevent callback-free grace periods from ever starting. (Famous last words...) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sat, Sep 22, 2012 at 06:16:15PM +, Paul Walmsley wrote: Hi Paul On Fri, 21 Sep 2012, Paul E. McKenney wrote: I am wondering if your system somehow figured out how to start a grace period that had no RCU callbacks waiting for it. If that happened, then a CONFIG_NO_HZ=y system could in theory get into a state where all CPUs are in dyntick-idle mode, so that none of them is doing anything to force the grace period to complete. That should be easy to diagnose, anyway. Please see below, which includes the earlier diagnostic patch. Here you go. - Paul [ 248.902618] INFO: rcu_sched self-detected stall on CPU [ 248.905456] 0: (1 ticks this GP) idle=933/1/0 [ 248.907897] (t=26570 jiffies g=11 c=10 q=0) Bingo!!! (q=0, in case you were wondering. And thank you for testing this!) Strangely enough, I believe that I have inadvertently fixed this in my -rcu tree: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next Nevertheless, if you get a chance to try it, I would be interested to hear if my guess is correct. The trick is that a kthread drives the grace period in -rcu, regardless of whether or not there are callbacks. However, the backport would not be something that -stable would be happy with, so I will be putting together a fix for mainline. This thing has been in the kernel since about 2004, not sure why you didn't hit it earlier. Thanx, Paul [ 248.910339] [c001bc90] (unwind_backtrace+0x0/0xf0) from [c00ad800] (rcu_check_callbacks+0x220/0x714) [ 248.915527] [c00ad800] (rcu_check_callbacks+0x220/0x714) from [c00532a0] (update_process_times+0x38/0x68) [ 248.920928] [c00532a0] (update_process_times+0x38/0x68) from [c008c9e8] (tick_sched_timer+0x80/0xec) [ 248.926116] [c008c9e8] (tick_sched_timer+0x80/0xec) from [c0068ed4] (__run_hrtimer+0x7c/0x1e0) [ 248.930999] [c0068ed4] (__run_hrtimer+0x7c/0x1e0) from [c0069cb8] (hrtimer_interrupt+0x11c/0x2d0) [ 248.936035] [c0069cb8] (hrtimer_interrupt+0x11c/0x2d0) from [c001a3cc] (twd_handler+0x30/0x44) [ 248.940948] [c001a3cc] (twd_handler+0x30/0x44) from [c00a7bd0] (handle_percpu_devid_irq+0x90/0x13c) [ 248.946075] [c00a7bd0] (handle_percpu_devid_irq+0x90/0x13c) from [c00a4344] (generic_handle_irq+0x30/0x48) [ 248.951538] [c00a4344] (generic_handle_irq+0x30/0x48) from [c0014e38] (handle_IRQ+0x4c/0xac) [ 248.956329] [c0014e38] (handle_IRQ+0x4c/0xac) from [c00084cc] (gic_handle_irq+0x28/0x5c) [ 248.960937] [c00084cc] (gic_handle_irq+0x28/0x5c) from [c04fb1a4] (__irq_svc+0x44/0x5c) [ 248.965484] Exception stack(0xc0729f58 to 0xc0729fa0) [ 248.968231] 9f40: 0003b832 0001 [ 248.972686] 9f60: c074a8e8 c0728000 c07c42c8 c05065a0 c074bdc8 411fc092 [ 248.977142] 9f80: c074bfe8 0001 c0729fa0 0003b833 c0015130 2113 [ 248.981597] [c04fb1a4] (__irq_svc+0x44/0x5c) from [c0015130] (default_idle+0x20/0x44) [ 248.986083] [c0015130] (default_idle+0x20/0x44) from [c001535c] (cpu_idle+0x9c/0x114) [ 248.990539] [c001535c] (cpu_idle+0x9c/0x114) from [c06d77b0] (start_kernel+0x2b4/0x304) -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sat, Sep 22, 2012 at 01:10:43PM -0700, Paul E. McKenney wrote: On Sat, Sep 22, 2012 at 06:42:08PM +, Paul Walmsley wrote: On Fri, 21 Sep 2012, Paul E. McKenney wrote: Could you please point me to a recipe for creating a minimal userspace? Just in case it is the userspac erather than the architecture/hardware that makes the difference. Tony's suggestion is pretty good. Note that there may also be differences in kernel timers -- either between x86 and ARM architectures, or loaded device drivers -- that may confound the problem. For example, there must be at least one RCU callback outstanding after the boot sequence quiets down. Of course, the last time I tried Tony's approach, I was doing it on top of my -rcu stack, so am retrying on v3.6-rc6. Just to make sure I understand the combinations: o All stalls have happened when running a minimal userspace. o CONFIG_NO_HZ=n suppresses the stalls. o CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has no observable effect on the stalls. Did I get that right, or am I missing a combination? That's correct. Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is cancelled upon exit from idle, and therefore should (almost) never actually execute. Its sole purpose is to wake up the CPU. ;-) Right. Just curious, what would wake up the kernel from idle to handle a grace period expiration when CONFIG_RCU_FAST_NO_HZ=n? On a very idle system, the time between timer ticks could potentially be several tens of seconds. If CONFIG_RCU_FAST_NO_HZ=n, then CPUs with RCU callbacks are not permitted to shut off the scheduling-clock tick, so any CPU with RCU callbacks will be awakened every jiffy. The problem is that there appears to be a way to get an RCU grace period started without any CPU having any callbacks, which, as you surmise, would result in all the CPUs going to sleep and the grace period never ending. So if a CPU is awakened for any reason after this everlasting grace period has extended for more than a minute, the first thing that CPU will do is print an RCU CPU stall warning. I believe that I see how to prevent callback-free grace periods from ever starting. (Famous last words...) And here is a patch. I am still having trouble reproducing the problem, but figured that I should avoid serializing things. Thanx, Paul b/kernel/rcutree.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) rcu: Fix day-one dyntick-idle stall-warning bug Each grace period is supposed to have at least one callback waiting for that grace period to complete. However, if CONFIG_NO_HZ=n, an extra callback-free grace period is no big problem -- it will chew up a tiny bit of CPU time, but it will complete normally. In contrast, CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to sleep indefinitely, in turn indefinitely delaying completion of the callback-free grace period. Given that nothing is waiting on this grace period, this is also not a problem. Unless RCU CPU stall warnings are also enabled, as they are in recent kernels. In this case, if a CPU wakes up after at least one minute of inactivity, an RCU CPU stall warning will result. The reason that no one noticed until quite recently is that most systems have enough OS noise that they will never remain absolutely idle for a full minute. But there are some embedded systems with cut-down userspace configurations that get into this mode quite easily. All this begs the question of exactly how a callback-free grace period gets started in the first place. This can happen due to the fact that CPUs do not necessarily agree on which grace period is in progress. If a CPU still believes that the grace period that just completed is still ongoing, it will believe that it has callbacks that need to wait for another grace period, never mind the fact that the grace period that they were waiting for just completed. This CPU can therefore erroneously decide to start a new grace period. Once this CPU notices that the earlier grace period completed, it will invoke its callbacks. It then won't have any callbacks left. If no other CPU has any callbacks, we now have a callback-free grace period. This commit therefore makes CPUs check more carefully before starting a new grace period. This new check relies on an array of tail pointers into each CPU's list of callbacks. If the CPU is up to date on which grace periods have completed, it checks to see if any callbacks follow the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks follow the RCU_WAIT_TAIL segment. The reason that this works is that the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment as soon as the CPU figures out that the old grace period has ended. This change
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sat, Sep 22, 2012 at 10:25:59PM +, Paul Walmsley wrote: On Sat, 22 Sep 2012, Paul E. McKenney wrote: And here is a patch. I am still having trouble reproducing the problem, but figured that I should avoid serializing things. Thanks, testing this now on v3.6-rc6. Very cool, thank you! One question though about the patch description: All this begs the question of exactly how a callback-free grace period gets started in the first place. This can happen due to the fact that CPUs do not necessarily agree on which grace period is in progress. If a CPU still believes that the grace period that just completed is still ongoing, it will believe that it has callbacks that need to wait for another grace period, never mind the fact that the grace period that they were waiting for just completed. This CPU can therefore erroneously decide to start a new grace period. Doesn't this imply that this bug would only affect multi-CPU systems? Surprisingly not, at least when running TREE_RCU or TREE_PREEMPT_RCU. In order to keep lock contention down to a dull roar on larger systems, TREE_RCU keeps three sets of books: (1) the global state in the rcu_state structure, (2) the combining-tree per-node state in the rcu_node structure, and the per-CPU state in the rcu_data structure. A CPU is not officially aware of the end of a grace period until it is reflected in its rcu_data structure. This has the perhaps-surprising consequence that the CPU that detected the end of the old grace period might start a new one before becoming officially aware that the old one ended. Why not have the CPU inform itself immediately upon noticing that the old grace period ended? Deadlock. The rcu_node locks must be acquired from leaf towards root, and the CPU is holding the root rcu_node lock when it notices that the grace period has ended. I have made this a bit less problematic in the bigrt branch, working towards a goal of getting RCU into a state where automatic formal validation might one day be possible. And yes, I am starting to get some formal-validation people interested in this lofty goal, see for example: http://sites.google.com/site/popl13grace/paper.pdf. The recent tests here have been on Pandaboard, which is dual-CPU, but my recollection is that I also observed the warnings on a single-core Beagleboard. Will re-test. Anxiously awaiting the results. This has been a strange one, even by RCU's standards. Plus I need to add a few Reported-by lines. Next version... Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sat, Sep 22, 2012 at 10:20:19PM +, Paul Walmsley wrote: Hi Paul On Sat, 22 Sep 2012, Paul E. McKenney wrote: Strangely enough, I believe that I have inadvertently fixed this in my -rcu tree: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git rcu/next Nevertheless, if you get a chance to try it, I would be interested to hear if my guess is correct. Yes, good news: the stall warnings go away with that branch. Very good! The trick is that a kthread drives the grace period in -rcu, regardless of whether or not there are callbacks. This is rcu: Move quiescent-state forcing into kthread ? Yep, plus the preceding commits moving grace-period initialization and cleanup into that same kthread. This was motivated by a bug report last February complaining about 200-microsecond latency spikes from RCU grace-period initialization. On systems with 4096 CPUs. Real-time response. It is far bigger than I thought. ;-) Added some debugging into rcu_gp_kthread() after that commit and can confirm that the quiescent-state forcing loop does start a few times when there are zero callbacks pending (modulo any races in my measurement code). Cool, thank you! Assuming it works, that indicates that there is long-term value to the fix for this problem. On larger systems, extra grace periods are not what you want, as their expense increases with the number of CPUs. However, the backport would not be something that -stable would be happy with, so I will be putting together a fix for mainline. This thing has been in the kernel since about 2004, not sure why you didn't hit it earlier. One other data point in that regard - noticed the warnings don't appear when the board is booted with: commit 4fa3b6cb1bc8c14b81b4c8ffdfd3f2500a7e9367 Author: Paul E. McKenney paul.mcken...@linaro.org Date: Tue Jun 5 15:53:53 2012 -0700 rcu: Fix qlen_lazy breakage You lost me on this one. This is already in mainline, so if you were using (say) 3.6-rc6, you would already have this commit applied. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Sun, Sep 23, 2012 at 01:42:10AM +, Paul Walmsley wrote: Hi Paul On Sat, 22 Sep 2012, Paul Walmsley wrote: On Sat, 22 Sep 2012, Paul E. McKenney wrote: And here is a patch. I am still having trouble reproducing the problem, but figured that I should avoid serializing things. Thanks, testing this now on v3.6-rc6. Looks like you solved it! Tested v3.6-rc6 + your stall diagnostic patch: http://marc.info/?l=linux-arm-kernelm=134827237215882w=2 on OMAP4430ES2 Pandaboard using omap2plus_defconfig and CONFIG_RCU_CPU_STALL_INFO=y; got the stall warnings. Then added rcu: Fix day-one dyntick-idle stall-warning bug from: http://marc.info/?l=linux-arm-kernelm=134835120600590w=2 Booted that, and the stall warnings did not appear within 30 minutes. Very cool, thank you for your testing efforts!!! May I apply your Tested-by to this patch? And good show on the debugging patch -- it is quite good to have such solid evidence that the bug that the fix was intended for was actually occurring. Thanx, Paul To confirm that the problem being solved matched your hypothesis, the debugging patch below[1] was added to the RCU idle entry/exit code. Without the bugfix patch, a boot log transcript was obtained indicating that the idle loop was entered with tick_nohz_enabled=1 during a grace period with no callbacks present: http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-pre-fix.txt The debugging events started to appear at 1.867370 seconds into the boot. ENTER was pressed about 464 seconds in; this triggered the rcu_sched stall traceback. With the bugfix patch, a boot log transcript was obtained that indicated that the condition under test never occurred after waiting about 20 minutes: http://www.pwsan.com/omap/transcripts/20120922-rcu-stall-debug-post-fix.txt Thanks for being so willing to root-cause the issue, Paul; it's appreciated, and it's been quite instructive as well. Will address some remaining loose ends in follow-up E-mails. - Paul [1] Debugging patch to printk() if the previous idle loop entry occurred with tick_nohz_enabled=1 during a grace period with no RCU callbacks present: --- kernel/rcutree.c | 17 + 1 file changed, 17 insertions(+) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index f1eb7ad..f42941b 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -60,6 +60,9 @@ /* Data structures. */ +extern int tick_nohz_enabled; +static int no_cbs_idle_entry_count; + static struct lock_class_key rcu_node_class[RCU_NUM_LVLS]; #define RCU_STATE_INITIALIZER(sname, cr) { \ @@ -400,8 +403,12 @@ void rcu_idle_enter(void) unsigned long flags; long long oldval; struct rcu_dynticks *rdtp; + int cpu; + long totqlen = 0; + struct rcu_data *rdp; local_irq_save(flags); + rdp = __get_cpu_var(rcu_sched_data); rdtp = __get_cpu_var(rcu_dynticks); oldval = rdtp-dynticks_nesting; WARN_ON_ONCE((oldval DYNTICK_TASK_NEST_MASK) == 0); @@ -410,6 +417,12 @@ void rcu_idle_enter(void) else rdtp-dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; rcu_idle_enter_common(rdtp, oldval); + if (tick_nohz_enabled rcu_gp_in_progress(rdp-rsp)) { + for_each_possible_cpu(cpu) + totqlen += per_cpu_ptr(rdp-rsp-rda, cpu)-qlen; + if (totqlen == 0) + no_cbs_idle_entry_count = 1; + } local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_idle_enter); @@ -503,6 +516,10 @@ void rcu_idle_exit(void) rdtp-dynticks_nesting = DYNTICK_TASK_EXIT_IDLE; rcu_idle_exit_common(rdtp, oldval); local_irq_restore(flags); + if (no_cbs_idle_entry_count) { + no_cbs_idle_entry_count = 0; + pr_err(* Tickless idle was entered with zero RCU callbacks\n); + } } EXPORT_SYMBOL_GPL(rcu_idle_exit); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Fri, Sep 21, 2012 at 06:08:59PM +, Paul Walmsley wrote: cc Frederic Weisbecker - context is here: http://marc.info/?l=linux-kernelm=134749030206016w=2 On Thu, 20 Sep 2012, Paul E. McKenney wrote: Fair point. I am wondering whether there is some path into the idle loop that somehow avoids telling RCU that the CPU has in face entered idle. There needs to be an rcu_idle_enter() call on the way to idle, otherwise RCU CPU stall warnings are expected behavior. As far as I know, our only idle entry point is in arch/arm/common/process.c:cpu_idle(). In mainline, this is arch/arm/kernel/process.c, correct? Looking at the x86 idle entry, they call rcu_idle_{enter,exit}() inside {stop,start}_critical_timings(). Making that change here didn't help. The reason x86 does this is that they have idle notifiers deeper in the idle loop that use RCU read-side critical sections. So this was an expected result. Also tried commenting out the code from the stop_critical_timings() call to the WARN_ON(irqs_disabled()), and adding a local_irq_enable(). That also didn't help, which suggests that the problem is not caused by the OMAP-specific PM idle code. I must admit that you make a convincing case here. Though it does leave me wondering what is different about Panda (and MX28, IIRC). I may take your advice of remote access to a Panda board, though that is likely to take a bit of time due to timezones. Regardless of the underlying issue here, I clearly need to make the stall-warning messages do a better job of printing out needed information. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Fri, Sep 21, 2012 at 07:11:14PM +, Paul Walmsley wrote: On Fri, 21 Sep 2012, Paul E. McKenney wrote: On Fri, Sep 21, 2012 at 06:08:59PM +, Paul Walmsley wrote: As far as I know, our only idle entry point is in arch/arm/common/process.c:cpu_idle(). In mainline, this is arch/arm/kernel/process.c, correct? Indeed; sorry about that, mistyped. No problem! Looking at the x86 idle entry, they call rcu_idle_{enter,exit}() inside {stop,start}_critical_timings(). Making that change here didn't help. The reason x86 does this is that they have idle notifiers deeper in the idle loop that use RCU read-side critical sections. So this was an expected result. OK Also tried commenting out the code from the stop_critical_timings() call to the WARN_ON(irqs_disabled()), and adding a local_irq_enable(). That also didn't help, which suggests that the problem is not caused by the OMAP-specific PM idle code. I must admit that you make a convincing case here. Though it does leave me wondering what is different about Panda (and MX28, IIRC). Given the dependency on CONFIG_NO_HZ, the stalls are probably dependent on the userspace in use. The userspaces here are quite minimal and so allow the system to stay idle for relatively long periods of time. Could you please point me to a recipe for creating a minimal userspace? Just in case it is the userspac erather than the architecture/hardware that makes the difference. I may take your advice of remote access to a Panda board, though that is likely to take a bit of time due to timezones. Regardless of the underlying issue here, I clearly need to make the stall-warning messages do a better job of printing out needed information. If you've got a patch in mind for that, I'll boot it here. Hammering it out, will send it along when it is a bit less destructive. ;-) One other observation. omap2plus_defconfig sets CONFIG_NO_HZ=y but doesn't set CONFIG_RCU_FAST_NO_HZ. The stall warning messages still appear when CONFIG_RCU_FAST_NO_HZ=y. One of them is attached below (with CONFIG_RCU_CPU_STALL_INFO set as well, obviously). Just to make sure I understand the combinations: o All stalls have happened when running a minimal userspace. o CONFIG_NO_HZ=n suppresses the stalls. o CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has no observable effect on the stalls. Did I get that right, or am I missing a combination? As an aside, in the CONFIG_RCU_FAST_NO_HZ=y build, I dropped a printk() into rcu_idle_gp_timer_func() and it doesn't look like it ever executed. Indeed, rcu_idle_gp_timer_func() is a bit strange in that it is cancelled upon exit from idle, and therefore should (almost) never actually execute. Its sole purpose is to wake up the CPU. ;-) Thanx, Paul - Paul [ 305.832000] INFO: rcu_sched self-detected stall on CPU [ 305.834838] 1: (2 GPs behind) idle=5b1/1/0 drain=0 . timer=4294967295 [ 305.838378] (t=17463 jiffies) [ 305.840118] [c001be10] (unwind_backtrace+0x0/0xf0) from [c00ad65c] (rcu_pending+0xd0/0x540) [ 305.844848] [c00ad65c] (rcu_pending+0xd0/0x540) from [c00ae5cc] (rcu_check_callbacks+0x110/0x198) [ 305.849884] [c00ae5cc] (rcu_check_callbacks+0x110/0x198) from [c0053800] (update_process_times+0x38/0x68) [ 305.855285] [c0053800] (update_process_times+0x38/0x68) from [c008cf40] (tick_sched_timer+0x80/0xec) [ 305.860473] [c008cf40] (tick_sched_timer+0x80/0xec) from [c006942c] (__run_hrtimer+0x7c/0x1e0) [ 305.865356] [c006942c] (__run_hrtimer+0x7c/0x1e0) from [c006a210] (hrtimer_interrupt+0x11c/0x2d0) [ 305.870361] [c006a210] (hrtimer_interrupt+0x11c/0x2d0) from [c001a54c] (twd_handler+0x30/0x44) [ 305.875274] [c001a54c] (twd_handler+0x30/0x44) from [c00a8128] (handle_percpu_devid_irq+0x90/0x13c) [ 305.880371] [c00a8128] (handle_percpu_devid_irq+0x90/0x13c) from [c00a489c] (generic_handle_irq+0x30/0x48) [ 305.885833] [c00a489c] (generic_handle_irq+0x30/0x48) from [c0014fb8] (handle_IRQ+0x4c/0xac) [ 305.890624] [c0014fb8] (handle_IRQ+0x4c/0xac) from [c000864c] (gic_handle_irq+0x28/0x5c) [ 305.895233] [c000864c] (gic_handle_irq+0x28/0x5c) from [c04fbc64] (__irq_svc+0x44/0x5c) [ 305.899780] Exception stack(0xde86ff88 to 0xde86ffd0) [ 305.902526] ff80: 0004c062 0001 de8660c0 de86e000 c07c42c8 [ 305.906982] ffa0: c05075a0 c074bdd0 411fc092 c074bff0 0001 de86ffd0 [ 305.911437] ffc0: 0004c063 c00152b0 2113 [ 305.914184] [c04fbc64] (__irq_svc+0x44/0x5c) from [c00152b0] (default_idle+0x20/0x44) [ 305.918640] [c00152b0] (default_idle+0x20/0x44) from [c00154dc] (cpu_idle+0x9c/0x114) [ 305.923126] [c00154dc] (cpu_idle+0x9c/0x114) from [804f4a34] (0x804f4a34) -- To unsubscribe from this list: send the line unsubscribe linux-omap
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Fri, Sep 21, 2012 at 05:47:31PM +, Paul Walmsley wrote: Hi Paul On Thu, 20 Sep 2012, Paul Walmsley wrote: On Thu, 20 Sep 2012, Paul E. McKenney wrote: Paul Walmsley, please let me know if the config below doesn't clear things up for you or if there is some reason why this config is infeasible. Will certainly test it here once things settle down. I built an OMAP kernel from Linus' commit 4651afbbae968772efd6dc4ba461cba9b49bb9d8 (Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq). The config used was 'omap2plus_defconfig', and enabled CONFIG_CPU_IDLE by hand. Booted it on a Pandaboard (OMAP4430ES2) into a very minimal Debian rootfs. Did you have the patch at https://lkml.org/lkml/2012/8/30/290 applied? If not, could you please try it? (This patch cleared up a similar problem for Becky, also on OMAP.) Thanx, Paul Long story short, the rcu_sched stall messages continue to appear. Transcript below. The above config should be easy to reproduce from mainline. Paul, not sure if you're still working with Linaro, but you should be able to get access to a Pandaboard through them to try it yourself, should you wish. - Paul ## Booting kernel from Legacy Image at 8200 ... Image Name: Linux-3.6.0-rc6-00025-g4651afb Image Type: ARM Linux Kernel Image (uncompressed) Data Size:4061576 Bytes = 3.9 MiB Load Address: 80008000 Entry Point: 80008000 Verifying Checksum ... OK Loading Kernel Image ... OK OK Starting kernel ... Uncompressing Linux... done, booting the kernel. [0.00] Booting Linux on physical CPU 0 [0.00] Linux version 3.6.0-rc6-00025-g4651afb (paul@dusk) (gcc version 4.5.1 (Sourcery G++ Lite 2010.09-50) ) #342 SMP Fri Sep 21 11:28:00 MDT 2012 [0.00] CPU: ARMv7 Processor [411fc092] revision 2 (ARMv7), cr=10c53c7d [0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache [0.00] Machine: OMAP4 Panda board [0.00] debug: ignoring loglevel setting. [0.00] Reserving 16777216 bytes SDRAM for VRAM [0.00] Memory policy: ECC disabled, Data cache writealloc [0.00] On node 0 totalpages: 126720 [0.00] free_area_init_node: node 0, pgdat c07c6140, node_mem_map c0d23000 [0.00] Normal zone: 1024 pages used for memmap [0.00] Normal zone: 0 pages reserved [0.00] Normal zone: 125696 pages, LIFO batch:31 [0.00] OMAP4430 ES2.0 [0.00] PERCPU: Embedded 9 pages/cpu @c112b000 s14016 r8192 d14656 u36864 [0.00] pcpu-alloc: s14016 r8192 d14656 u36864 alloc=9*4096 [0.00] pcpu-alloc: [0] 0 [0] 1 [0.00] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 125696 [0.00] Kernel command line: console=ttyO2,230400n8 vram=16M root=/dev/mmcblk0p2 rw rootfstype=ext3 rootwait earlyprintk debug ignore_loglevel [0.00] PID hash table entries: 2048 (order: 1, 8192 bytes) [0.00] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) [0.00] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) [0.00] Memory: 495MB = 495MB total [0.00] Memory: 488828k/488828k available, 35460k reserved, 0K highmem [0.00] Virtual kernel memory layout: [0.00] vector : 0x - 0x1000 ( 4 kB) [0.00] fixmap : 0xfff0 - 0xfffe ( 896 kB) [0.00] vmalloc : 0xe080
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Fri, Sep 21, 2012 at 01:31:49PM -0700, Tony Lindgren wrote: * Paul E. McKenney paul...@linux.vnet.ibm.com [120921 12:58]: Just to make sure I understand the combinations: o All stalls have happened when running a minimal userspace. o CONFIG_NO_HZ=n suppresses the stalls. o CONFIG_RCU_FAST_NO_HZ (which depends on CONFIG_NO_HZ=y) has no observable effect on the stalls. The reason why you may need minimal userspace is to cut down the number of timers waking up the system with NO_HZ. Booting with init=/bin/sh might also do the trick for that. Good point! This does make for a very quiet system, but does not reproduce the problem under kvm, even after waiting for four minutes. I will leave it for more time, but it looks like I really might need to ask Linaro for remote access to a Panda. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation: RCU: update the stall warning message timer=-1 to match reality
On Fri, Sep 21, 2012 at 04:13:29PM +, Paul Walmsley wrote: The CONFIG_RCU_FAST_NO_HZ stall warning messages can never emit timer=-1. This is because the printf() format specifier to generate that number is '%lu'. So, update the documentation to use the unsigned long equivalent instead, timer=4294967295. This is what actually shows up in traces. Signed-off-by: Paul Walmsley p...@pwsan.com Cc: Paul E. McKenney paul...@linux.vnet.ibm.com Cc: Dipankar Sarma dipan...@in.ibm.com Good catch! Even worse, it gives timer=18446744073709551615 on 64-bit systems, which is no easier on the eyes. I therefore changed the code to print a nicer message in this case, patch below. The meaning of the timer=4294967295 was that the corresponding CPU was either non-idle or idle with no RCU callbacks, FWIW. Thanx, Paul - rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning message The print_cpu_stall_fast_no_hz() function attempts to print -1 when the -idle_gp_timer is not pending, but unsigned arithmetic causes it to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and 18446744073709551615 on 64-bit systems. Neither of these are the most reader-friendly values, so this commit instead causes timer not pending to be printed when -idle_gp_timer is not pending. Reported-by: Paul Walmsley p...@pwsan.com Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 523364e..1927151 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -99,7 +99,7 @@ In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is printed: INFO: rcu_preempt detected stall on CPU - 0: (64628 ticks this GP) idle=dd5/3fff/0 drain=0 . timer=-1 + 0: (64628 ticks this GP) idle=dd5/3fff/0 drain=0 . timer not pending (t=65000 jiffies) The (64628 ticks this GP) indicates that this CPU has taken more @@ -116,13 +116,13 @@ number between the two /s is the value of the nesting, which will be a small positive number if in the idle loop and a very large positive number (as shown above) otherwise. -For CONFIG_RCU_FAST_NO_HZ kernels, the drain=0 indicates that the -CPU is not in the process of trying to force itself into dyntick-idle -state, the . indicates that the CPU has not given up forcing RCU -into dyntick-idle mode (it would be H otherwise), and the timer=-1 -indicates that the CPU has not recented forced RCU into dyntick-idle -mode (it would otherwise indicate the number of microseconds remaining -in this forced state). +For CONFIG_RCU_FAST_NO_HZ kernels, the drain=0 indicates that the CPU is +not in the process of trying to force itself into dyntick-idle state, the +. indicates that the CPU has not given up forcing RCU into dyntick-idle +mode (it would be H otherwise), and the timer not pending indicates +that the CPU has not recently forced RCU into dyntick-idle mode (it +would otherwise indicate the number of microseconds remaining in this +forced state). Multiple Warnings From One Stall diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index 3b1a11e..be822f0 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -2245,11 +2245,15 @@ static void print_cpu_stall_fast_no_hz(char *cp, int cpu) { struct rcu_dynticks *rdtp = per_cpu(rcu_dynticks, cpu); struct timer_list *tltp = rdtp-idle_gp_timer; + char c; - sprintf(cp, drain=%d %c timer=%lu, - rdtp-dyntick_drain, - rdtp-dyntick_holdoff == jiffies ? 'H' : '.', - timer_pending(tltp) ? tltp-expires - jiffies : -1); + c = rdtp-dyntick_holdoff == jiffies ? 'H' : '.'; + if (timer_pending(tltp)) + sprintf(cp, drain=%d %c timer=%lu, + rdtp-dyntick_drain, c, tltp-expires - jiffies); + else + sprintf(cp, drain=%d %c timer not pending, + rdtp-dyntick_drain, c); } #else /* #ifdef CONFIG_RCU_FAST_NO_HZ */ -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Fri, Sep 21, 2012 at 12:57:17PM -0700, Paul E. McKenney wrote: On Fri, Sep 21, 2012 at 07:11:14PM +, Paul Walmsley wrote: On Fri, 21 Sep 2012, Paul E. McKenney wrote: [ . . . ] I may take your advice of remote access to a Panda board, though that is likely to take a bit of time due to timezones. Regardless of the underlying issue here, I clearly need to make the stall-warning messages do a better job of printing out needed information. If you've got a patch in mind for that, I'll boot it here. Hammering it out, will send it along when it is a bit less destructive. ;-) And I eventually decided to keep things simple, please see below. This applies on top of https://lkml.org/lkml/2012/8/30/290. Thanx, Paul rcu: Add grace-period information to RCU CPU stall warnings This commit causes the last grace period started and completed to be printed on RCU CPU stall warning messages in order to aid diagnosis. Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com diff --git a/kernel/rcutree.c b/kernel/rcutree.c index cdaa7aa..fdb6854 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -925,8 +925,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp) raw_spin_unlock_irqrestore(rnp-lock, flags); print_cpu_stall_info_end(); - printk(KERN_CONT (detected by %d, t=%ld jiffies)\n, - smp_processor_id(), (long)(jiffies - rsp-gp_start)); + pr_cont((detected by %d, t=%ld jiffies, g=%lu, c=%lu)\n, + smp_processor_id(), (long)(jiffies - rsp-gp_start), + rsp-gpnum, rsp-completed); if (ndetected == 0) printk(KERN_ERR INFO: Stall ended before state dump start\n); else if (!trigger_all_cpu_backtrace()) @@ -953,7 +954,8 @@ static void print_cpu_stall(struct rcu_state *rsp) print_cpu_stall_info_begin(); print_cpu_stall_info(rsp, smp_processor_id()); print_cpu_stall_info_end(); - printk(KERN_CONT (t=%lu jiffies)\n, jiffies - rsp-gp_start); + pr_cont( (t=%lu jiffies g=%lu c=%lu)\n, + jiffies - rsp-gp_start, rsp-gpnum, rsp-completed); if (!trigger_all_cpu_backtrace()) dump_stack(); -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Fri, Sep 21, 2012 at 10:41:14PM +, Paul Walmsley wrote: On Fri, 21 Sep 2012, Paul E. McKenney wrote: On Fri, Sep 21, 2012 at 05:47:31PM +, Paul Walmsley wrote: I built an OMAP kernel from Linus' commit 4651afbbae968772efd6dc4ba461cba9b49bb9d8 (Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq). The config used was 'omap2plus_defconfig', and enabled CONFIG_CPU_IDLE by hand. Booted it on a Pandaboard (OMAP4430ES2) into a very minimal Debian rootfs. Did you have the patch at https://lkml.org/lkml/2012/8/30/290 applied? No, it's just as described above. If not, could you please try it? (This patch cleared up a similar problem for Becky, also on OMAP.) Did not seem to help, either with or without CONFIG_CPU_IDLE. I was hoping! ;-) And my init=/bin/sh kernel ran idle for more than an hour without any RCU CPU stall warnings... I am wondering if your system somehow figured out how to start a grace period that had no RCU callbacks waiting for it. If that happened, then a CONFIG_NO_HZ=y system could in theory get into a state where all CPUs are in dyntick-idle mode, so that none of them is doing anything to force the grace period to complete. That should be easy to diagnose, anyway. Please see below, which includes the earlier diagnostic patch. Thanx, Paul diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 307caf1..696f189 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -879,6 +879,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp) unsigned long flags; int ndetected = 0; struct rcu_node *rnp = rcu_get_root(rsp); + long totqlen = 0; /* Only let one CPU complain about others per time interval. */ @@ -923,8 +924,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp) raw_spin_unlock_irqrestore(rnp-lock, flags); print_cpu_stall_info_end(); - printk(KERN_CONT (detected by %d, t=%ld jiffies)\n, - smp_processor_id(), (long)(jiffies - rsp-gp_start)); + for_each_possible_cpu(cpu) + totqlen += per_cpu_ptr(rsp-rda, cpu)-qlen; + pr_cont((detected by %d, t=%ld jiffies, g=%lu, c=%lu, q=%lu)\n, + smp_processor_id(), (long)(jiffies - rsp-gp_start), + rsp-gpnum, rsp-completed, totqlen); if (ndetected == 0) printk(KERN_ERR INFO: Stall ended before state dump start\n); else if (!trigger_all_cpu_backtrace()) @@ -939,8 +943,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp) static void print_cpu_stall(struct rcu_state *rsp) { + int cpu; unsigned long flags; struct rcu_node *rnp = rcu_get_root(rsp); + long totqlen = 0; /* * OK, time to rat on ourselves... @@ -951,7 +957,10 @@ static void print_cpu_stall(struct rcu_state *rsp) print_cpu_stall_info_begin(); print_cpu_stall_info(rsp, smp_processor_id()); print_cpu_stall_info_end(); - printk(KERN_CONT (t=%lu jiffies)\n, jiffies - rsp-gp_start); + for_each_possible_cpu(cpu) + totqlen += per_cpu_ptr(rsp-rda, cpu)-qlen; + pr_cont( (t=%lu jiffies g=%lu c=%lu q=%lu)\n, + jiffies - rsp-gp_start, rsp-gpnum, rsp-completed, totqlen); if (!trigger_all_cpu_backtrace()) dump_stack(); -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Thu, Sep 20, 2012 at 09:49:13PM +, Bruce, Becky wrote: On Sep 20, 2012, at 2:56 AM, Paul Walmsley wrote: Hi, On Wed, 19 Sep 2012, Paul E. McKenney wrote: On Thu, Sep 13, 2012 at 06:52:10PM +, Paul Walmsley wrote: On Wed, 12 Sep 2012, Paul E. McKenney wrote: Subodh Nijsure (also CCed) reported something that might be similar on ARM, and also reported that setting the following got rid of the stalls: CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y At which point he was happy, which was good, but which also left the underlying problem unsolved. Do these affect your system? If so, do they cause a different ARM idle loop to be executed? Will give this a try. What board was Subodh using? Any news on trying the above settings? Sorry, haven't had the chance to try it yet due to the impending merge window opening. Once things settle down I'll give it a try -- or maybe someone else can test it in the meantime. OK, people, you can stop heckling me about sent from my iPhone - I'm in the wilds of rural Louisiana with really bad internet service and was trying to work on my phone (but, alas, did not notice the CC list included the entire universe). Shame on me. With the above set, I don't seem to see any stalls with the RCU timeout set to 60s (the default). I left the board running for 25 minutes; I will fire it up again later and let it run for a bit longer, but usually I end up seeing the problem pretty quickly so I don't expect that to result in anything. I also didn't see any stalls on Paul's RCU tree as of a week ago at 60s, so as far as I can tell the CPU_IDLE stuff didn't have any impact (it wasn't on when I tested Paul's tree). If I drop the timeout to 5s as Paul M. suggested for debug a while back, I do see stalls (both with CPU_IDLE stuff and without). I'm using the default omap2plus config, with RCU stall info enabled and the cpu idle stuff turned on (console dump below). This is a Panda ES 1.1 (OMAP4460) Thank you for the testing, Becky! Paul Walmsley, please let me know if the config below doesn't clear things up for you or if there is some reason why this config is infeasible. Thanx, Paul root@omap4430-panda:~# zcat /proc/config/gz. .. .. ..gz | grep RCU # RCU Subsystem CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 # CONFIG_RCU_FANOUT_EXACT is not set # CONFIG_RCU_FAST_NO_HZ is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_PROVE_RCU is not set # CONFIG_SPARSE_RCU_POINTER is not set # CONFIG_RCU_TORTURE_TEST is not set CONFIG_RCU_CPU_STALL_TIMEOUT=5 CONFIG_RCU_CPU_STALL_INFO=y # CONFIG_RCU_TRACE is not set root@omap4430-panda:~# zcat /proc/config.gz | grep IDLE CONFIG_GENERIC_SMP_IDLE_THREAD=y CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED=y Paul, let me know if you want me to try anything else. My internet connection is spotty today but (obviously :) I will see emails on my phone and will test when I can. Cheers, B Console output: root@omap4430-panda:~# [ 377.495361] INFO: rcu_sched self-detected stall on CPU [ 377.500762] .1: (1 ticks this GP) idle=dcd/1/0 [ 377.505523] . (t=761 jiffies) [ 377.508666] [c0019da0] (unwind_backtrace+0x0/0xf8) from [c009b138] (rcu_check_callbacks+0x204/0x790) [ 377.518615] [c009b138] (rcu_check_callbacks+0x204/0x790) from [c0045890] (update_process_times+0x38/0x68) [ 377.529022] [c0045890] (update_process_times+0x38/0x68) from [c007d47c] (tick_sched_timer+0x80/0xec) [ 377.538970] [c007d47c] (tick_sched_timer+0x80/0xec) from [c005a2fc] (__run_hrtimer+0x7c/0x218) [ 377.548339] [c005a2fc] (__run_hrtimer+0x7c/0x218) from [c005b040] (hrtimer_interrupt+0x130/0x2d8) [ 377.558013] [c005b040] (hrtimer_interrupt+0x130/0x2d8) from [c0018998] (twd_handler+0x30/0x44) [ 377.567413] [c0018998] (twd_handler+0x30/0x44) from [c00960cc] (handle_percpu_devid_irq+0x90/0x158) [ 377.577270] [c00960cc] (handle_percpu_devid_irq+0x90/0x158) from [c00929ac] (generic_handle_irq+0x30/0x44) [ 377.587768] [c00929ac] (generic_handle_irq+0x30/0x44) from [c0013bd8] (handle_IRQ+0x4c/0xac) [ 377.596984] [c0013bd8] (handle_IRQ+0x4c/0xac) from [c0008470] (gic_handle_irq+0x24/0x58) [ 377.605834] [c0008470] (gic_handle_irq+0x24/0x58) from [c0487604] (__irq_svc+0x44/0x58) [ 377.614593] Exception stack(0xee06ff08 to 0xee06ff50) [ 377.619873] ff00: 0001 0001 3b9aca00 c608bc44 0057 [ 377.628448] ff20: c146a4f0 0002 c54e3b8f 0056 c048fb3c c0c47654 0001 ee06ff50 [ 377.637023] ff40: c0084774 c0390fac 2113 [ 377.642333] [c0487604] (__irq_svc+0x44/0x58) from [c0390fac] (cpuidle_wrap_enter+0x4c/0xa4) [ 377.651458] [c0390fac
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Thu, Sep 20, 2012 at 10:47:25PM +, Paul Walmsley wrote: On Thu, 20 Sep 2012, Paul E. McKenney wrote: On Thu, Sep 20, 2012 at 09:49:13PM +, Bruce, Becky wrote: OK, people, you can stop heckling me about sent from my iPhone - I'm in the wilds of rural Louisiana with really bad internet service and was trying to work on my phone (but, alas, did not notice the CC list included the entire universe). Shame on me. With the above set, I don't seem to see any stalls with the RCU timeout set to 60s (the default). I left the board running for 25 minutes; I will fire it up again later and let it run for a bit longer, but usually I end up seeing the problem pretty quickly so I don't expect that to result in anything. I also didn't see any stalls on Paul's RCU tree as of a week ago at 60s, so as far as I can tell the CPU_IDLE stuff didn't have any impact (it wasn't on when I tested Paul's tree). If I drop the timeout to 5s as Paul M. suggested for debug a while back, I do see stalls (both with CPU_IDLE stuff and without). I'm using the default omap2plus config, with RCU stall info enabled and the cpu idle stuff turned on (console dump below). This is a Panda ES 1.1 (OMAP4460) Thank you for the testing, Becky! Paul Walmsley, please let me know if the config below doesn't clear things up for you or if there is some reason why this config is infeasible. Will certainly test it here once things settle down. But I wouldn't consider the use of CONFIG_CPU_IDLE to be a 'fix' until the problem can be understood. We use !CONFIG_CPU_IDLE kernels for testing here. Fair point. I am wondering whether there is some path into the idle loop that somehow avoids telling RCU that the CPU has in face entered idle. There needs to be an rcu_idle_enter() call on the way to idle, otherwise RCU CPU stall warnings are expected behavior. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Thu, Sep 13, 2012 at 06:52:10PM +, Paul Walmsley wrote: Hi Paul, thanks for the reply, On Wed, 12 Sep 2012, Paul E. McKenney wrote: Interesting. I am assuming that the interrupt in the stack below came from idle, if not, please let me know what. According to the exception stack section in the original traceback, it appears that the serial interrupt took the SoC out of idle. Could you please reproduce with CONFIG_RCU_CPU_STALL_INFO=y? That would give me a bit more information about why RCU thought that there was a stall. (CCing Becky Bruce, who saw something similar recently.) At the bottom of this mail is a series of tracebacks with CONFIG_RCU_CPU_STALL_INFO=y. Unlike the traceback that was sent in the last message, these were not triggered by serial activity. These appeared every 300 seconds. Subodh Nijsure (also CCed) reported something that might be similar on ARM, and also reported that setting the following got rid of the stalls: CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y At which point he was happy, which was good, but which also left the underlying problem unsolved. Do these affect your system? If so, do they cause a different ARM idle loop to be executed? Will give this a try. What board was Subodh using? Hello, Paul, Any news on trying the above settings? Thanx, Paul - Paul Debian GNU/Linux wheezy/sid armel ttyO2 armel login: [ 305.942108] INFO: rcu_sched self-detected stall on CPU [ 305.944946] 1: (7 GPs behind) idle=57b/1/0 [ 305.947265] (t=22811 jiffies) [ 305.949066] [c001b7cc] (unwind_backtrace+0x0/0xf0) from [c00acc28] (rcu_check_callbacks+0x1b0/0x678) [ 305.954223] [c00acc28] (rcu_check_callbacks+0x1b0/0x678) from [c00529e0] (update_process_times+0x38/0x68) [ 305.959625] [c00529e0] (update_process_times+0x38/0x68) from [c008bf14] (tick_sched_timer+0x80/0xec) [ 305.964813] [c008bf14] (tick_sched_timer+0x80/0xec) from [c006840c] (__run_hrtimer+0x7c/0x1e0) [ 305.969696] [c006840c] (__run_hrtimer+0x7c/0x1e0) from [c00691f0] (hrtimer_interrupt+0x11c/0x2d0) [ 305.974731] [c00691f0] (hrtimer_interrupt+0x11c/0x2d0) from [c001a04c] (twd_handler+0x30/0x44) [ 305.979644] [c001a04c] (twd_handler+0x30/0x44) from [c00a7068] (handle_percpu_devid_irq+0x90/0x13c) [ 305.984741] [c00a7068] (handle_percpu_devid_irq+0x90/0x13c) from [c00a37dc] (generic_handle_irq+0x30/0x48) [ 305.990234] [c00a37dc] (generic_handle_irq+0x30/0x48) from [c0014c58] (handle_IRQ+0x4c/0xac) [ 305.995025] [c0014c58] (handle_IRQ+0x4c/0xac) from [c0008478] (gic_handle_irq+0x28/0x5c) [ 305.999633] [c0008478] (gic_handle_irq+0x28/0x5c) from [c04f8ca4] (__irq_svc+0x44/0x5c) [ 306.004180] Exception stack(0xde86ff88 to 0xde86ffd0) [ 306.006927] ff80: 0003c6d0 0001 de8660c0 de86e000 c07c23c8 [ 306.011383] ffa0: c0504590 c0749e20 411fc092 c074a040 0001 de86ffd0 [ 306.015838] ffc0: 0003c6d1 c0014f50 2113 [ 306.018585] [c04f8ca4] (__irq_svc+0x44/0x5c) from [c0014f50] (default_idle+0x20/0x44) [ 306.023040] [c0014f50] (default_idle+0x20/0x44) from [c001517c] (cpu_idle+0x9c/0x114) [ 306.027526] [c001517c] (cpu_idle+0x9c/0x114) from [804f1af4] (0x804f1af4) [ 602.004486] INFO: rcu_sched detected stalls on CPUs/tasks: [ 602.007476] (detected by 0, t=60707 jiffies) [ 602.009857] INFO: Stall ended before state dump start [ 906.027893] INFO: rcu_sched self-detected stall on CPU [ 906.030700] 1: (6 GPs behind) idle=647/1/0 [ 906.033020] (t=38379 jiffies) [ 906.034790] [c001b7cc] (unwind_backtrace+0x0/0xf0) from [c00acc28] (rcu_check_callbacks+0x1b0/0x678) [ 906.039947] [c00acc28] (rcu_check_callbacks+0x1b0/0x678) from [c00529e0] (update_process_times+0x38/0x68) [ 906.045349] [c00529e0] (update_process_times+0x38/0x68) from [c008bf14] (tick_sched_timer+0x80/0xec) [ 906.050537] [c008bf14] (tick_sched_timer+0x80/0xec) from [c006840c] (__run_hrtimer+0x7c/0x1e0) [ 906.055419] [c006840c] (__run_hrtimer+0x7c/0x1e0) from [c00691f0] (hrtimer_interrupt+0x11c/0x2d0) [ 906.060424] [c00691f0] (hrtimer_interrupt+0x11c/0x2d0) from [c001a04c] (twd_handler+0x30/0x44) [ 906.065307] [c001a04c] (twd_handler+0x30/0x44) from [c00a7068] (handle_percpu_devid_irq+0x90/0x13c) [ 906.070434] [c00a7068] (handle_percpu_devid_irq+0x90/0x13c) from [c00a37dc] (generic_handle_irq+0x30/0x48) [ 906.075897] [c00a37dc] (generic_handle_irq+0x30/0x48) from [c0014c58] (handle_IRQ+0x4c/0xac) [ 906.080688] [c0014c58] (handle_IRQ+0x4c/0xac) from [c0008478] (gic_handle_irq+0x28/0x5c) [ 906.085296] [c0008478] (gic_handle_irq+0x28/0x5c) from [c04f8ca4] (__irq_svc+0x44/0x5c) [ 906.089843] Exception stack(0xde86ff88 to 0xde86ffd0) [ 906.092590] ff80: 0003cb06 0001 de8660c0 de86e000
Re: rcu self-detected stall messages on OMAP3, 4 boards
On Wed, Sep 12, 2012 at 10:51:30PM +, Paul Walmsley wrote: Hi Paul Recently several of us have been seeing INFO: rcu_sched self-detected stall on CPU { 0} (t=20611 jiffies) stack tracebacks on various OMAP3 and 4 board. I only noticed it during v3.6-rc3, but I suspect it's been happening for users at least since May: http://www.mail-archive.com/linux-omap@vger.kernel.org/msg68942.html The only quasi-reproducible test case that I've found so far is to boot the board with serial console enabled to a login prompt, then wait for a few minutes, then send a keypress to the board via serial. The tracebacks I get look like this: Interesting. I am assuming that the interrupt in the stack below came from idle, if not, please let me know what. Could you please reproduce with CONFIG_RCU_CPU_STALL_INFO=y? That would give me a bit more information about why RCU thought that there was a stall. (CCing Becky Bruce, who saw something similar recently.) Subodh Nijsure (also CCed) reported something that might be similar on ARM, and also reported that setting the following got rid of the stalls: CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y At which point he was happy, which was good, but which also left the underlying problem unsolved. Do these affect your system? If so, do they cause a different ARM idle loop to be executed? Thanx, Paul [ 467.480712] INFO: rcu_sched self-detected stall on CPU { 0} (t=20611 jiffies) [ 467.484741] [c001b7cc] (unwind_backtrace+0x0/0xf0) from [c00acc94] (rcu_check_callbacks+0x180/0x630) [ 467.489929] [c00acc94] (rcu_check_callbacks+0x180/0x630) from [c0052b18] (update_process_times+0x38/0x68) [ 467.495361] [c0052b18] (update_process_times+0x38/0x68) from [c008c04c] (tick_sched_timer+0x80/0xec) [ 467.500518] [c008c04c] (tick_sched_timer+0x80/0xec) from [c0068544] (__run_hrtimer+0x7c/0x1e0) [ 467.505401] [c0068544] (__run_hrtimer+0x7c/0x1e0) from [c0069328] (hrtimer_interrupt+0x11c/0x2d0) [ 467.510437] [c0069328] (hrtimer_interrupt+0x11c/0x2d0) from [c001a04c] (twd_handler+0x30/0x44) [ 467.515350] [c001a04c] (twd_handler+0x30/0x44) from [c00a71a0] (handle_percpu_devid_irq+0x90/0x13c) [ 467.520477] [c00a71a0] (handle_percpu_devid_irq+0x90/0x13c) from [c00a3914] (generic_handle_irq+0x30/0x48) [ 467.525939] [c00a3914] (generic_handle_irq+0x30/0x48) from [c0014c58] (handle_IRQ+0x4c/0xac) [ 467.530731] [c0014c58] (handle_IRQ+0x4c/0xac) from [c0008478] (gic_handle_irq+0x28/0x5c) [ 467.535339] [c0008478] (gic_handle_irq+0x28/0x5c) from [c04f8ce4] (__irq_svc+0x44/0x5c) [ 467.539886] Exception stack(0xc0729f58 to 0xc0729fa0) [ 467.542663] 9f40: 00047f2a 0001 [ 467.547119] 9f60: c074a940 c0728000 c07c4b08 c05045a0 c074be20 411fc092 [ 467.551574] 9f80: c074c040 0001 c0729fa0 00047f2b c0014f50 2113 [ 467.556030] [c04f8ce4] (__irq_svc+0x44/0x5c) from [c0014f50] (default_idle+0x20/0x44) [ 467.560485] [c0014f50] (default_idle+0x20/0x44) from [c001517c] (cpu_idle+0x9c/0x114) [ 467.564971] [c001517c] (cpu_idle+0x9c/0x114) from [c06d77b0] (start_kernel+0x2b4/0x304) Looks like this message was added as of commit a858af2875fb291d0f4b0a4419fefbf03c2379c0 (rcu: Print scheduling-clock information on RCU CPU stall-warning messages). Do you have any suggestions for how we can determine what is causing this? Here's an example of a kernel config that we use: http://www.pwsan.com/omap/testlogs/am33xx_hwmod_clock_devel_3.7/20120912092510/build/omap2plus_defconfig/Kconfig A few observations that may or may not be relevant: we use NO_HZ, and we also have a clockevents timer that is relatively slow to program. regards, - Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] omap changes for v2.6.39 merge window
On Wed, Mar 30, 2011 at 02:54:35PM -0700, Tony Lindgren wrote: * Thomas Gleixner t...@linutronix.de [110330 14:07]: So one person will be not enough, that needs to be a whole team of experienced people in the very near future to deal with the massive tsunami of crap which is targeted at mainline. If we fail to set that up, then we run into a very ugly maintainability issue in no time. One thing that will help here and distribute the load is to move more things under drivers/ as then we have more maintainers looking at the code. In many cases, the ARM SoC vendors will want their people producing the code, so although moving things to drivers might be a good thing to do, it won't really increase the number of people involved. Plus the move to the drivers subtree would be a problem for devices with tight ties to the board or SoC. There is work on pushing towards common code, but there is a lot of code and this will take time and a lot of work. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] omap changes for v2.6.39 merge window
On Wed, Mar 30, 2011 at 03:47:52PM -0700, Tony Lindgren wrote: * Paul E. McKenney paul...@linux.vnet.ibm.com [110330 15:35]: On Wed, Mar 30, 2011 at 02:54:35PM -0700, Tony Lindgren wrote: * Thomas Gleixner t...@linutronix.de [110330 14:07]: So one person will be not enough, that needs to be a whole team of experienced people in the very near future to deal with the massive tsunami of crap which is targeted at mainline. If we fail to set that up, then we run into a very ugly maintainability issue in no time. One thing that will help here and distribute the load is to move more things under drivers/ as then we have more maintainers looking at the code. In many cases, the ARM SoC vendors will want their people producing the code, so although moving things to drivers might be a good thing to do, it won't really increase the number of people involved. Plus the move to the drivers subtree would be a problem for devices with tight ties to the board or SoC. There is work on pushing towards common code, but there is a lot of code and this will take time and a lot of work. I agree on the common code part, then even drivers with tight ties to board or SoC become just generic drivers that are easy to review. Yep! The trick is getting to that point. Some drivers will be easier than others. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] power: introduce library for device-specific OPPs
On Fri, Sep 24, 2010 at 07:50:40AM -0500, Nishanth Menon wrote: SoCs have a standard set of tuples consisting of frequency and voltage pairs that the device will support per voltage domain. These are called Operating Performance Points or OPPs. The actual definitions of OPP varies over silicon versions. For a specific domain, we can have a set of {frequency, voltage} pairs. As the kernel boots and more information is available, a default set of these are activated based on the precise nature of device. Further on operation, based on conditions prevailing in the system (such as temperature), some OPP availability may be temporarily controlled by the SoC frameworks. To implement an OPP, some sort of power management support is necessary hence this library depends on CONFIG_PM. Contributions include: Sanjeev Premi for the initial concept: http://patchwork.kernel.org/patch/50998/ Kevin Hilman for converting original design to device-based Kevin Hilman and Paul Walmsey for cleaning up many of the function abstractions, improvements and data structure handling Romit Dasgupta for using enums instead of opp pointers Thara Gopinath, Eduardo Valentin and Vishwanath BS for fixes and cleanups. Linus Walleij for recommending this layer be made generic for usage in other architectures beyond OMAP and ARM. Mark Brown, Andrew Morton, Rafael J Wysocki for valuable improvements. Discussions and comments from: http://marc.info/?l=linux-omapm=126033945313269w=2 http://marc.info/?l=linux-omapm=125482970102327w=2 http://marc.info/?t=12580924752r=1w=2 http://marc.info/?l=linux-omapm=126025973426007w=2 http://marc.info/?t=128152609200064r=1w=2 http://marc.info/?t=12846872302r=1w=2 http://marc.info/?t=12851921862r=1w=2 incorporated. Looks like a good start!!! Some questions and suggestions about RCU usage interspersed below. Thanx, Paul Cc: Benoit Cousson b-cous...@ti.com Cc: Madhusudhan Chikkature Rajashekar madhu...@ti.com Cc: Phil Carmody ext-phil.2.carm...@nokia.com Cc: Roberto Granados Dorado x0095...@ti.com Cc: Santosh Shilimkar santosh.shilim...@ti.com Cc: Sergio Alberto Aguirre Rodriguez saagui...@ti.com Cc: Tero Kristo tero.kri...@nokia.com Cc: Eduardo Valentin eduardo.valen...@nokia.com Cc: Paul Walmsley p...@pwsan.com Cc: Sanjeev Premi pr...@ti.com Cc: Thara Gopinath th...@ti.com Cc: Vishwanath BS vishwanath...@ti.com Cc: Linus Walleij linus.wall...@stericsson.com Cc: Mark Brown broo...@opensource.wolfsonmicro.com Cc: Andrew Morton a...@linux-foundation.org Cc: Rafael J. Wysocki r...@sisk.pl Signed-off-by: Nishanth Menon n...@ti.com Signed-off-by: Kevin Hilman khil...@deeprootsystems.com --- V4: Cc list trimmed to just the MLs. RCU implementation - thanks for the pointers Rafael re-tested on OMAP3630 with lock debugging enabled including cpufreq, idle paths and exception cases - enable/disable. available_opp_count removed as it complicated the updates altogether + the usage latencies are not too high given small lists. In removed list_reverse usage as rcu has no list reverse equivalents + searching either direction has no real difference when the list is ordered. change in protos for get_{voltage,frequency} - const removed from parameter as rcu_dereference causes build warnings introduced opp_set_availability helper to wrap common code between opp_{enable,disable} - leaving these as is as they are more readable from usage perspective + these are not meant to be used in hot paths anyways. Header guard __ASM_OPP_H replaced with __LINUX_OPP_H__ for opp.h V3: https://patchwork.kernel.org/patch/200382/ Bunch of documentation update based on feedback removed opp_def - opp_add enables the opp by default lock usage optimization Sample usage: http://pastebin.mozilla.org/794786 - cleanups for OMAP power http://pastebin.mozilla.org/794787 - Sample OPP initialization for OMAP V2: https://patchwork.kernel.org/patch/189532/ Incorporated review comments from v1. major changes being: $subject change to reflect this is for power. Documentation revamp and including it in the patch :) OPP_DEF removed - lets introduce this if needed or leave it to SOC frameworks to organize code as needed. Rename of enable to available to better reflect the intent few fixes and typos Introduced mutex based locking for controlling access to list modification (note: query functions are still unsafe- rationale in the patch below) A new home for opp.c in drivers/base/power (moved from lib/) A few optimization in function flow and additional error checks added to exposed functions offline aligned with Kevin for cleaning up the copyrights V1: http://marc.info/?t=12846872302r=1w=2 Documentation/power/00-INDEX |2 + Documentation/power/opp.txt | 326
Re: [PATCH v4] power: introduce library for device-specific OPPs
On Fri, Sep 24, 2010 at 04:26:21PM -0500, Nishanth Menon wrote: Paul E. McKenney had written, on 09/24/2010 02:37 PM, the following: [...] Looks like a good start!!! Some questions and suggestions about RCU Thanks for the review.. few comments below.. Back at you! ;-) usage interspersed below. Thanx, Paul [...] + +/** + * find_device_opp() - find device_opp struct using device pointer + * @dev: device pointer used to lookup device OPPs + * + * Search list of device OPPs for one containing matching device. Does a RCU + * reader operation to grab the pointer needed. + * + * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or + * -EINVAL based on type of error. + */ +static struct device_opp *find_device_opp(struct device *dev) +{ + struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV); + + if (unlikely(!dev || IS_ERR(dev))) { + pr_err(%s: Invalid parameters being passed\n, __func__); + return ERR_PTR(-EINVAL); + } + + rcu_read_lock(); + list_for_each_entry_rcu(tmp_dev_opp, dev_opp_list, node) { + if (tmp_dev_opp-dev == dev) { + dev_opp = tmp_dev_opp; + break; + } + } + rcu_read_unlock(); What prevents the structure pointed to by dev_opp from being freed at this point? We are no longer in an RCU read-side critical section, so RCU grace periods starting during the above RCU read-side critical section can now end. dev_opp is never freed in the implementation - it represents domains, only adds with list_add_rcu() is done - wont the usage be safe then? or I being blind? the reason why we dont free is coz of the following: dev_opp represents voltage domains in opp library. SoC frameworks are required to register only those voltage domain opp that are required. by allowing a free logic, I knew it'd have complicated the implementation way beyond what we needed it to be. Perhaps I was confusing two different data structures, if so, apologies. So you are freeing the opp level, but never the dev_opp level, then? But yes, if you are only adding and never deleting, then it is safe to pass the pointers out of an RCU read-side critical section. But please add a comment saying why you are doing this. Otherwise, Coccinelle will cause me to continue complaining about this to you. ;-) And the later uses still look buggy to me, please see below. Here is an example sequence of events that I am worried about: o CPU 1 enters find_device_opp(), and pick up a pointer to a given device opp. o CPU 2 executes opp_set_availability(), replacing that same device opp with a new one. It then calls synchronize_rcu() which blocks waiting for CPU 1 to exit its RCU read-side critical section. o CPU 1 exits its RCU read-side critical section, arriving at this point in the code. o CPU 2's synchronize_rcu() is now permitted to return, executing the kfree(), which frees up the memory that CPU 1's dev_opp pointer references. o This newly freed memory is allocated for some other structure by CPU 3. CPU 1 and CPU 3 are now trying to use the same memory for two different structures, and nothing good can possibly come of this. The kernel dies a brutal and nasty death. One way to fix this is to have the caller do rcu_read_lock() before calling find_device_opp(), and to do rcu_read_unlock() only after the caller has finished using the pointer that find_device_opp() returns. This works well unless the caller needs to do some blocking operation before it gets done using the pointer. Another approach is for find_device_opp() to use a reference count on the structure, and for opp_set_availability() to avoid freeing the structure unless/until the reference counter drops to zero. There are other approaches as well, please feel free to take a look at Documentation/RCU/rcuref.txt for more info on using reference counting and RCU. thx. I probably should read yet again if I got my understanding of usage right.. [...] + +/** + * opp_find_freq_exact() - search for an exact frequency + * @dev: device for which we do this operation + * @freq:frequency to search for + * @is_available:true/false - match for available opp + * + * Searches for exact match in the opp list and returns pointer to the matching + * opp if found, else returns ERR_PTR in case of error and should be handled + * using IS_ERR. + * + * Note: available is a modifier for the search. if available=true, then the + * match is for exact matching frequency and is available in the stored OPP + * table. if false, the match is for exact frequency which is not available. + * + * This provides a mechanism to enable an opp
Re: [RFC][PATCH 0/1] kmemleak: Fix false positive with alias
On Fri, Aug 27, 2010 at 09:12:24AM +0300, Hiroshi DOYU wrote: Hi Catalin, From: Hiroshi DOYU hiroshi.d...@nokia.com Subject: Re: [RFC][PATCH 0/1] kmemleak: Fix false positive with alias Date: Tue, 10 Aug 2010 18:49:03 +0300 (EEST) snip The performance impact is indeed pretty high, though some parts of the code look over-engineered to me (the __scan_block function with a loop going through an array of two function pointers - I think the compiler cannot figure out what to inline). You could just extend the find_and_get_object() to search both trees under a single spinlock region (as locking also takes time). Ok, a good point. Now there's not much difference with the attached patch, a new version of alias. / # modprobe kmemleak-special-test use_alias=0 / # time echo scan /sys/kernel/debug/kmemleak real0m 2.30s user0m 0.00s sys 0m 2.30s / # modprobe kmemleak-special-test use_alias=1 / # time echo scan /sys/kernel/debug/kmemleak real0m 3.91s user0m 0.00s sys 0m 3.91s It would be nice if you could have some time to take a look at this patch and give some comments. From a5670d69b2cafe85f6f26f6951097210d3b9917f Mon Sep 17 00:00:00 2001 From: Hiroshi DOYU hiroshi.d...@nokia.com Date: Thu, 17 Jun 2010 13:36:45 +0300 Subject: [PATCH 1/1] kmemleak: Fix false positive with address alias There is a false positive case that a pointer is calculated by other methods than the usual container_of macro. kmemleak_ignore can cover such a false positive, but it would loose the advantage of memory leak detection. This patch allows kmemleak to work with such false positives by aliasing of address. A client module can register an alias address to an original pointer. A typical use case could be the IOMMU pagetable allocation which stores pointers to the second level of page tables with some conversion, for example, a physical address with attribute bits. Right now I don't have other use cases but I hope that there could be some that this special scan works with. A few questions below... Thanx, Paul Signed-off-by: Hiroshi DOYU hiroshi.d...@nokia.com Cc: Phil Carmody ext-phil.2.carm...@nokia.com --- include/linux/kmemleak.h |8 ++ mm/kmemleak.c| 208 +++--- 2 files changed, 204 insertions(+), 12 deletions(-) diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h index 99d9a67..9e2af3a 100644 --- a/include/linux/kmemleak.h +++ b/include/linux/kmemleak.h @@ -34,6 +34,8 @@ extern void kmemleak_not_leak(const void *ptr) __ref; extern void kmemleak_ignore(const void *ptr) __ref; extern void kmemleak_scan_area(const void *ptr, size_t size, gfp_t gfp) __ref; extern void kmemleak_no_scan(const void *ptr) __ref; +extern void kmemleak_add_alias(const void *ptr, const void *new) __ref; +extern void kmemleak_unalias(const void *alias) __ref; static inline void kmemleak_alloc_recursive(const void *ptr, size_t size, int min_count, unsigned long flags, @@ -92,6 +94,12 @@ static inline void kmemleak_erase(void **ptr) static inline void kmemleak_no_scan(const void *ptr) { } +static inline void kmemleak_add_alias(const void *ptr, const void *new) +{ +} +static inline void kmemleak_unalias(const void *alias) +{ +} #endif /* CONFIG_DEBUG_KMEMLEAK */ diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 2c0d032..3875cb7 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -157,6 +157,13 @@ struct kmemleak_object { unsigned long jiffies; /* creation timestamp */ pid_t pid; /* pid of the current task */ char comm[TASK_COMM_LEN]; /* executable name */ + struct kmemleak_alias *alias; /* if a pointer is modified */ +}; + +struct kmemleak_alias { + struct list_head alias_list; + struct prio_tree_node tree_node; + struct kmemleak_object *object; }; /* flag representing the memory block allocation status */ @@ -179,13 +186,18 @@ struct kmemleak_object { static LIST_HEAD(object_list); /* the list of gray-colored objects (see color_gray comment below) */ static LIST_HEAD(gray_list); +/* the list of objects with alias (see alias comment below) */ +static LIST_HEAD(alias_list); /* prio search tree for object boundaries */ static struct prio_tree_root object_tree_root; +/* prio search tree for alias object boundaries */ +static struct prio_tree_root alias_tree_root; /* rw_lock protecting the access to object_list and prio_tree_root */ static DEFINE_RWLOCK(kmemleak_lock); /* allocation caches for kmemleak internal data */ static struct kmem_cache *object_cache; +static struct kmem_cache *alias_cache; static struct kmem_cache *scan_area_cache; /* set