On Thu, Feb 25, 2016 at 04:56:38PM -0800, Paul E. McKenney wrote: > On Thu, Feb 25, 2016 at 04:13:11PM +1100, Ross Green wrote: > > On Wed, Feb 24, 2016 at 8:28 AM, Ross Green <[email protected]> wrote: > > > On Wed, Feb 24, 2016 at 7:55 AM, Paul E. McKenney > > > <[email protected]> wrote: > > [ . . . ] > > > >> Still working on getting decent traces... > > And I might have succeeded, see below. > > > >> > > >> Thanx, Paul > > >> > > > > > > G'day all, > > > > > > Here is another dmesg output for 4.5-rc5 showing another rcu_preempt > > > stall. > > > This one appeared after only a day of running. CONFIG_DEBUG_TIMING is > > > turned on, but can't see any output that shows from this. > > > > > > Again testing as before, > > > > > > Boot, run a series of small benchmarks, then just let the system be > > > and idle away. > > > > > > I notice in the stack trace there is mention of hrtimer_run_queues and > > > hrtimer_interrupt. > > > > > > Anyway, leave this for a few more eyes to look at. > > > > > > Open to any other suggestions of things to test. > > > > > > Regards, > > > > > > Ross Green > > > > > > G'day Paul, > > > > I left the pandaboard running and captured another stall. > > > > the attachment is the dmesg output. > > > > Again there is no apparent output from any CONFIG_DEBUG_TIMING so I > > assume there is nothing happening there. > > I agree, looks like this is not due to time skew. > > > I just saw the updates for 4.6 RCU code. > > Is the patch in [PATCH tip/core/rcu 04/13] valid here? > > I doubt that it will help, but you never know. > > > do you want me try the new patch set with this configuration? > > Even better would be to try Daniel Wagner's swait patchset. I have > attached them in UNIX mbox format, or you can get them from the > -tip tree. > > And I -finally- got some tracing that -might- be useful. The dmesg, all > 67MB of it, is here: > > http://www.rdrop.com/~paulmck/submission/console.2016.02.23a.log > > This failure mode is less likely to happen, and looks a bit different > than the ones that I was seeing before enabling tracing. Then, an > additional wakeup would actually wake the task up. In contrast, with > tracing enabled, the RCU grace-period kthread goes into "teenager mode", > refusing to wake up despite repeated attempts. However, this might > be a side-effect of the ftrace dump. > > On line 525,132, we see that the rcu_preempt grace-period kthread has > been starved for 1,188,154 jiffies, or about 20 minutes. This seems > unlikely... The kthread is waiting for no more than a three-jiffy > timeout ("RCU_GP_WAIT_FQS(3)") and is in TASK_INTERRUPTIBLE state > ("0x1").
We're seeing a similar stall (~60 seconds) on an x86 development system here. Any luck tracking down the cause of this? If not, any suggestions for traces that might be helpful? - Josh Triplett

