Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 09:48:21PM +0200, Meelis Roos wrote:
> > > > rcu: Make rcu_barrier() understand about missing rcuo kthreads
> > > 
> > > I can not trigger the previous problem at will so my testing is not very 
> > > useful. I tested the previous patch that you sent but the problem did 
> > > not trigger, and yesterday applied this one and it still does not 
> > > trigger, but I an using remote sessions only for now (no desktop).
> > > 
> > > At least it's not clearly broken :)
> > 
> > A couple of other people had more reproducible test setups and confirmed
> > that it fixed them.
> > 
> > But yes, there are no doubt other RCU bugs waiting to be found.  ;-)
> 
> I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium 
> startup failing, and your patch fixes that.
> 
> Tested-by: Meelis Roos 

Got it, thank you!!!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Meelis Roos
> > > rcu: Make rcu_barrier() understand about missing rcuo kthreads
> > 
> > I can not trigger the previous problem at will so my testing is not very 
> > useful. I tested the previous patch that you sent but the problem did 
> > not trigger, and yesterday applied this one and it still does not 
> > trigger, but I an using remote sessions only for now (no desktop).
> > 
> > At least it's not clearly broken :)
> 
> A couple of other people had more reproducible test setups and confirmed
> that it fixed them.
> 
> But yes, there are no doubt other RCU bugs waiting to be found.  ;-)

I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium 
startup failing, and your patch fixes that.

Tested-by: Meelis Roos 

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 10:16:43AM +0200, Meelis Roos wrote:
> > rcu: Make rcu_barrier() understand about missing rcuo kthreads
> 
> I can not trigger the previous problem at will so my testing is not very 
> useful. I tested the previous patch that you sent but the problem did 
> not trigger, and yesterday applied this one and it still does not 
> trigger, but I an using remote sessions only for now (no desktop).
> 
> At least it's not clearly broken :)

A couple of other people had more reproducible test setups and confirmed
that it fixed them.

But yes, there are no doubt other RCU bugs waiting to be found.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Meelis Roos
> rcu: Make rcu_barrier() understand about missing rcuo kthreads

I can not trigger the previous problem at will so my testing is not very 
useful. I tested the previous patch that you sent but the problem did 
not trigger, and yesterday applied this one and it still does not 
trigger, but I an using remote sessions only for now (no desktop).

At least it's not clearly broken :)

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Meelis Roos
 rcu: Make rcu_barrier() understand about missing rcuo kthreads

I can not trigger the previous problem at will so my testing is not very 
useful. I tested the previous patch that you sent but the problem did 
not trigger, and yesterday applied this one and it still does not 
trigger, but I an using remote sessions only for now (no desktop).

At least it's not clearly broken :)

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 10:16:43AM +0200, Meelis Roos wrote:
  rcu: Make rcu_barrier() understand about missing rcuo kthreads
 
 I can not trigger the previous problem at will so my testing is not very 
 useful. I tested the previous patch that you sent but the problem did 
 not trigger, and yesterday applied this one and it still does not 
 trigger, but I an using remote sessions only for now (no desktop).
 
 At least it's not clearly broken :)

A couple of other people had more reproducible test setups and confirmed
that it fixed them.

But yes, there are no doubt other RCU bugs waiting to be found.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Meelis Roos
   rcu: Make rcu_barrier() understand about missing rcuo kthreads
  
  I can not trigger the previous problem at will so my testing is not very 
  useful. I tested the previous patch that you sent but the problem did 
  not trigger, and yesterday applied this one and it still does not 
  trigger, but I an using remote sessions only for now (no desktop).
  
  At least it's not clearly broken :)
 
 A couple of other people had more reproducible test setups and confirmed
 that it fixed them.
 
 But yes, there are no doubt other RCU bugs waiting to be found.  ;-)

I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium 
startup failing, and your patch fixes that.

Tested-by: Meelis Roos mr...@linux.ee

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-28 Thread Paul E. McKenney
On Tue, Oct 28, 2014 at 09:48:21PM +0200, Meelis Roos wrote:
rcu: Make rcu_barrier() understand about missing rcuo kthreads
   
   I can not trigger the previous problem at will so my testing is not very 
   useful. I tested the previous patch that you sent but the problem did 
   not trigger, and yesterday applied this one and it still does not 
   trigger, but I an using remote sessions only for now (no desktop).
   
   At least it's not clearly broken :)
  
  A couple of other people had more reproducible test setups and confirmed
  that it fixed them.
  
  But yes, there are no doubt other RCU bugs waiting to be found.  ;-)
 
 I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium 
 startup failing, and your patch fixes that.
 
 Tested-by: Meelis Roos mr...@linux.ee

Got it, thank you!!!

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-27 Thread Paul E. McKenney
On Fri, Oct 24, 2014 at 07:55:09AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote:

[ . . . ]

> > > The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
> > > looks a bit odd, but perhaps this is stack-tracing inaccuracy.
> > > 
> > > Could you please send your .config?
> 
> This does look like something that a couple of other people are seeing.
> Could you please check to see if reverting this commit makes the problem
> go away?
> 
> 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
> 
> This cannot be the full fix because this commit is itself a bug fix, but
> it would be good to check whether you are really seeing what they are
> seeing.  ;-)



rcu: Make rcu_barrier() understand about missing rcuo kthreads

Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online.  This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.

It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix.   The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.

It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks.  It is therefore required to wait even for those callbacks
that cannot possibly be invoked.  Even if doing so hangs the system.

Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case.  Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().

So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.

Reported-by: Yanko Kaneti 
Reported-by: Jay Vosburgh 
Reported-by: Meelis Roos 
Reported-by: Eric B Munson 
Signed-off-by: Paul E. McKenney 

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
 /*
  * Tracepoint for _rcu_barrier() execution.  The string "s" describes
  * the _rcu_barrier phase:
- * "Begin": rcu_barrier_callback() started.
- * "Check": rcu_barrier_callback() checking for piggybacking.
- * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- * "Inc1": rcu_barrier_callback() piggyback check counter incremented.
- * "Offline": rcu_barrier_callback() found offline CPU
- * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ * "Begin": _rcu_barrier() started.
+ * "Check": _rcu_barrier() checking for piggybacking.
+ * "EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ * "Inc1": _rcu_barrier() piggyback check counter incremented.
+ * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ * "OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
  * "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
  * "CB": An rcu_barrier_callback() invoked a callback, not the last.
  * "LastCB": An rcu_barrier_callback() invoked the last callback.
- * "Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ * "Inc2": _rcu_barrier() piggyback check counter incremented.
  * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
  * is the count of remaining callbacks, and "done" is the piggybacking count.
  */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 

Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-27 Thread Paul E. McKenney
On Fri, Oct 24, 2014 at 07:55:09AM -0700, Paul E. McKenney wrote:
 On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote:

[ . . . ]

   The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
   looks a bit odd, but perhaps this is stack-tracing inaccuracy.
   
   Could you please send your .config?
 
 This does look like something that a couple of other people are seeing.
 Could you please check to see if reverting this commit makes the problem
 go away?
 
 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs
 
 This cannot be the full fix because this commit is itself a bug fix, but
 it would be good to check whether you are really seeing what they are
 seeing.  ;-)



rcu: Make rcu_barrier() understand about missing rcuo kthreads

Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online.  This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.

It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix.   The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.

It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks.  It is therefore required to wait even for those callbacks
that cannot possibly be invoked.  Even if doing so hangs the system.

Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case.  Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().

So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.

Reported-by: Yanko Kaneti yan...@declera.com
Reported-by: Jay Vosburgh jay.vosbu...@canonical.com
Reported-by: Meelis Roos mr...@linux.ee
Reported-by: Eric B Munson emun...@akamai.com
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
 /*
  * Tracepoint for _rcu_barrier() execution.  The string s describes
  * the _rcu_barrier phase:
- * Begin: rcu_barrier_callback() started.
- * Check: rcu_barrier_callback() checking for piggybacking.
- * EarlyExit: rcu_barrier_callback() piggybacked, thus early exit.
- * Inc1: rcu_barrier_callback() piggyback check counter incremented.
- * Offline: rcu_barrier_callback() found offline CPU
- * OnlineNoCB: rcu_barrier_callback() found online no-CBs CPU.
- * OnlineQ: rcu_barrier_callback() found online CPU with callbacks.
- * OnlineNQ: rcu_barrier_callback() found online CPU, no callbacks.
+ * Begin: _rcu_barrier() started.
+ * Check: _rcu_barrier() checking for piggybacking.
+ * EarlyExit: _rcu_barrier() piggybacked, thus early exit.
+ * Inc1: _rcu_barrier() piggyback check counter incremented.
+ * OfflineNoCB: _rcu_barrier() found callback on never-online CPU
+ * OnlineNoCB: _rcu_barrier() found online no-CBs CPU.
+ * OnlineQ: _rcu_barrier() found online CPU with callbacks.
+ * OnlineNQ: _rcu_barrier() found online CPU, no callbacks.
  * IRQ: An rcu_barrier_callback() callback posted on remote CPU.
  * CB: An rcu_barrier_callback() invoked a callback, not the last.
  * LastCB: An rcu_barrier_callback() invoked the last callback.
- * Inc2: rcu_barrier_callback() piggyback check counter incremented.
+ * Inc2: _rcu_barrier() piggyback check counter incremented.
  * The cpu argument is the CPU or -1 if meaningless, the cnt argument
  * is the count of remaining callbacks, and done is the piggybacking count.
  */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ 

Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-24 Thread Paul E. McKenney
On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote:
> Config is below. 
> 
> > > [  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 
> > > seconds.
> > > [  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df 
> > > #150
> > > [  960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> > > disables this message.
> > > [  960.346621] kworker/u16:0   D  0 6  2 
> > > 0x
> > > [  960.346633] Workqueue: netns cleanup_net
> > > [  960.346635]  880111c4d1e0 0002 880111c4d1e0 
> > > 880111c53fd8
> > > [  960.346639]  880111c4d1e0 a000 880111c4d1e0 
> > > 0296
> > > [  960.346642]   7fff 7fff 
> > > 8182e010
> > > [  960.346646] Call Trace:
> > > [  960.346654]  [] ? schedule_timeout+0x18d/0x200
> > > [  960.346660]  [] ? get_state_synchronize_rcu+0x10/0x10
> > > [  960.346664]  [] ? wait_for_completion+0xb4/0x110
> > > [  960.346669]  [] ? 
> > > _raw_spin_unlock_irqrestore+0x5/0x10
> > > [  960.346672]  [] ? wait_for_completion+0xa9/0x110
> > > [  960.346677]  [] ? wake_up_state+0x10/0x10
> > > [  960.346681]  [] ? _rcu_barrier+0x14a/0x1d0
> > > [  960.346686]  [] ? netdev_run_todo+0x5c/0x310
> > > [  960.346691]  [] ? 
> > > rollback_registered_many+0x265/0x2d0
> > > [  960.346696]  [] ? 
> > > default_device_exit_batch+0x136/0x150
> > > [  960.346701]  [] ? __wake_up_sync+0x10/0x10
> > > [  960.346704]  [] ? cleanup_net+0xf0/0x1d0
> > > [  960.346709]  [] ? process_one_work+0x136/0x380
> > > [  960.346713]  [] ? pwq_activate_delayed_work+0x27/0x40
> > > [  960.346716]  [] ? worker_thread+0x63/0x480
> > > [  960.346720]  [] ? rescuer_thread+0x270/0x270
> > > [  960.346723]  [] ? kthread+0xce/0xf0
> > > [  960.346726]  [] ? queue_kthread_work+0x80/0x80
> > > [  960.346729]  [] ? kthread_create_on_node+0x180/0x180
> > > [  960.346734]  [] ? ret_from_fork+0x7c/0xb0
> > > [  960.346737]  [] ? kthread_create_on_node+0x180/0x180
> > > [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 
> > > seconds.
> > > [ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df 
> > > #150
> > > [ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> > > disables this message.
> > > [ 1080.363594] kworker/u16:0   D  0 6  2 
> > > 0x
> > > [ 1080.363604] Workqueue: netns cleanup_net
> > > [ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
> > > 880111c53fd8
> > > [ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
> > > 0296
> > > [ 1080.363614]   7fff 7fff 
> > > 8182e010
> > > [ 1080.363617] Call Trace:
> > > [ 1080.363625]  [] ? schedule_timeout+0x18d/0x200
> > > [ 1080.363633]  [] ? get_state_synchronize_rcu+0x10/0x10
> > > [ 1080.363637]  [] ? wait_for_completion+0xb4/0x110
> > > [ 1080.363641]  [] ? 
> > > _raw_spin_unlock_irqrestore+0x5/0x10
> > > [ 1080.363645]  [] ? wait_for_completion+0xa9/0x110
> > > [ 1080.363649]  [] ? wake_up_state+0x10/0x10
> > > [ 1080.363653]  [] ? _rcu_barrier+0x14a/0x1d0
> > > [ 1080.363658]  [] ? netdev_run_todo+0x5c/0x310
> > > [ 1080.363663]  [] ? 
> > > rollback_registered_many+0x265/0x2d0
> > > [ 1080.363667]  [] ? 
> > > default_device_exit_batch+0x136/0x150
> > > [ 1080.363672]  [] ? __wake_up_sync+0x10/0x10
> > > [ 1080.363676]  [] ? cleanup_net+0xf0/0x1d0
> > > [ 1080.363681]  [] ? process_one_work+0x136/0x380
> > > [ 1080.363684]  [] ? pwq_activate_delayed_work+0x27/0x40
> > > [ 1080.363688]  [] ? worker_thread+0x63/0x480
> > > [ 1080.363691]  [] ? rescuer_thread+0x270/0x270
> > > [ 1080.363695]  [] ? kthread+0xce/0xf0
> > > [ 1080.363698]  [] ? queue_kthread_work+0x80/0x80
> > > [ 1080.363702]  [] ? kthread_create_on_node+0x180/0x180
> > > [ 1080.363706]  [] ? ret_from_fork+0x7c/0xb0
> > > [ 1080.363709]  [] ? kthread_create_on_node+0x180/0x180
> > 
> > The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
> > looks a bit odd, but perhaps this is stack-tracing inaccuracy.
> > 
> > Could you please send your .config?

This does look like something that a couple of other people are seeing.
Could you please check to see if reverting this commit makes the problem
go away?

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

This cannot be the full fix because this commit is itself a bug fix, but
it would be good to check whether you are really seeing what they are
seeing.  ;-)

Thanx, Paul

> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86 3.18.0-rc1 Kernel Configuration
> #
> CONFIG_64BIT=y
> CONFIG_X86_64=y
> CONFIG_X86=y
> CONFIG_INSTRUCTION_DECODER=y
> CONFIG_OUTPUT_FORMAT="elf64-x86-64"
> CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_STACKTRACE_SUPPORT=y
> 

Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-24 Thread Meelis Roos
Config is below. 

> > [  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
> > [  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
> > [  960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [  960.346621] kworker/u16:0   D  0 6  2 
> > 0x
> > [  960.346633] Workqueue: netns cleanup_net
> > [  960.346635]  880111c4d1e0 0002 880111c4d1e0 
> > 880111c53fd8
> > [  960.346639]  880111c4d1e0 a000 880111c4d1e0 
> > 0296
> > [  960.346642]   7fff 7fff 
> > 8182e010
> > [  960.346646] Call Trace:
> > [  960.346654]  [] ? schedule_timeout+0x18d/0x200
> > [  960.346660]  [] ? get_state_synchronize_rcu+0x10/0x10
> > [  960.346664]  [] ? wait_for_completion+0xb4/0x110
> > [  960.346669]  [] ? _raw_spin_unlock_irqrestore+0x5/0x10
> > [  960.346672]  [] ? wait_for_completion+0xa9/0x110
> > [  960.346677]  [] ? wake_up_state+0x10/0x10
> > [  960.346681]  [] ? _rcu_barrier+0x14a/0x1d0
> > [  960.346686]  [] ? netdev_run_todo+0x5c/0x310
> > [  960.346691]  [] ? rollback_registered_many+0x265/0x2d0
> > [  960.346696]  [] ? default_device_exit_batch+0x136/0x150
> > [  960.346701]  [] ? __wake_up_sync+0x10/0x10
> > [  960.346704]  [] ? cleanup_net+0xf0/0x1d0
> > [  960.346709]  [] ? process_one_work+0x136/0x380
> > [  960.346713]  [] ? pwq_activate_delayed_work+0x27/0x40
> > [  960.346716]  [] ? worker_thread+0x63/0x480
> > [  960.346720]  [] ? rescuer_thread+0x270/0x270
> > [  960.346723]  [] ? kthread+0xce/0xf0
> > [  960.346726]  [] ? queue_kthread_work+0x80/0x80
> > [  960.346729]  [] ? kthread_create_on_node+0x180/0x180
> > [  960.346734]  [] ? ret_from_fork+0x7c/0xb0
> > [  960.346737]  [] ? kthread_create_on_node+0x180/0x180
> > [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
> > [ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
> > [ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [ 1080.363594] kworker/u16:0   D  0 6  2 
> > 0x
> > [ 1080.363604] Workqueue: netns cleanup_net
> > [ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
> > 880111c53fd8
> > [ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
> > 0296
> > [ 1080.363614]   7fff 7fff 
> > 8182e010
> > [ 1080.363617] Call Trace:
> > [ 1080.363625]  [] ? schedule_timeout+0x18d/0x200
> > [ 1080.363633]  [] ? get_state_synchronize_rcu+0x10/0x10
> > [ 1080.363637]  [] ? wait_for_completion+0xb4/0x110
> > [ 1080.363641]  [] ? _raw_spin_unlock_irqrestore+0x5/0x10
> > [ 1080.363645]  [] ? wait_for_completion+0xa9/0x110
> > [ 1080.363649]  [] ? wake_up_state+0x10/0x10
> > [ 1080.363653]  [] ? _rcu_barrier+0x14a/0x1d0
> > [ 1080.363658]  [] ? netdev_run_todo+0x5c/0x310
> > [ 1080.363663]  [] ? rollback_registered_many+0x265/0x2d0
> > [ 1080.363667]  [] ? default_device_exit_batch+0x136/0x150
> > [ 1080.363672]  [] ? __wake_up_sync+0x10/0x10
> > [ 1080.363676]  [] ? cleanup_net+0xf0/0x1d0
> > [ 1080.363681]  [] ? process_one_work+0x136/0x380
> > [ 1080.363684]  [] ? pwq_activate_delayed_work+0x27/0x40
> > [ 1080.363688]  [] ? worker_thread+0x63/0x480
> > [ 1080.363691]  [] ? rescuer_thread+0x270/0x270
> > [ 1080.363695]  [] ? kthread+0xce/0xf0
> > [ 1080.363698]  [] ? queue_kthread_work+0x80/0x80
> > [ 1080.363702]  [] ? kthread_create_on_node+0x180/0x180
> > [ 1080.363706]  [] ? ret_from_fork+0x7c/0xb0
> > [ 1080.363709]  [] ? kthread_create_on_node+0x180/0x180
> 
> The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
> looks a bit odd, but perhaps this is stack-tracing inaccuracy.
> 
> Could you please send your .config?


#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 3.18.0-rc1 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y

Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-24 Thread Meelis Roos
Config is below. 

  [  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
  [  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
  [  960.346618] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
  this message.
  [  960.346621] kworker/u16:0   D  0 6  2 
  0x
  [  960.346633] Workqueue: netns cleanup_net
  [  960.346635]  880111c4d1e0 0002 880111c4d1e0 
  880111c53fd8
  [  960.346639]  880111c4d1e0 a000 880111c4d1e0 
  0296
  [  960.346642]   7fff 7fff 
  8182e010
  [  960.346646] Call Trace:
  [  960.346654]  [8153dcfd] ? schedule_timeout+0x18d/0x200
  [  960.346660]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
  [  960.346664]  [8153c544] ? wait_for_completion+0xb4/0x110
  [  960.346669]  [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10
  [  960.346672]  [8153c539] ? wait_for_completion+0xa9/0x110
  [  960.346677]  [81069450] ? wake_up_state+0x10/0x10
  [  960.346681]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
  [  960.346686]  [8145f13c] ? netdev_run_todo+0x5c/0x310
  [  960.346691]  [81458775] ? rollback_registered_many+0x265/0x2d0
  [  960.346696]  [81459db6] ? default_device_exit_batch+0x136/0x150
  [  960.346701]  [8107a240] ? __wake_up_sync+0x10/0x10
  [  960.346704]  [81454310] ? cleanup_net+0xf0/0x1d0
  [  960.346709]  [8105b566] ? process_one_work+0x136/0x380
  [  960.346713]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
  [  960.346716]  [8105ba83] ? worker_thread+0x63/0x480
  [  960.346720]  [8105ba20] ? rescuer_thread+0x270/0x270
  [  960.346723]  [8105fc9e] ? kthread+0xce/0xf0
  [  960.346726]  [8106] ? queue_kthread_work+0x80/0x80
  [  960.346729]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
  [  960.346734]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
  [  960.346737]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
  [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
  [ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
  [ 1080.363591] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
  this message.
  [ 1080.363594] kworker/u16:0   D  0 6  2 
  0x
  [ 1080.363604] Workqueue: netns cleanup_net
  [ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
  880111c53fd8
  [ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
  0296
  [ 1080.363614]   7fff 7fff 
  8182e010
  [ 1080.363617] Call Trace:
  [ 1080.363625]  [8153dcfd] ? schedule_timeout+0x18d/0x200
  [ 1080.363633]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
  [ 1080.363637]  [8153c544] ? wait_for_completion+0xb4/0x110
  [ 1080.363641]  [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10
  [ 1080.363645]  [8153c539] ? wait_for_completion+0xa9/0x110
  [ 1080.363649]  [81069450] ? wake_up_state+0x10/0x10
  [ 1080.363653]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
  [ 1080.363658]  [8145f13c] ? netdev_run_todo+0x5c/0x310
  [ 1080.363663]  [81458775] ? rollback_registered_many+0x265/0x2d0
  [ 1080.363667]  [81459db6] ? default_device_exit_batch+0x136/0x150
  [ 1080.363672]  [8107a240] ? __wake_up_sync+0x10/0x10
  [ 1080.363676]  [81454310] ? cleanup_net+0xf0/0x1d0
  [ 1080.363681]  [8105b566] ? process_one_work+0x136/0x380
  [ 1080.363684]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
  [ 1080.363688]  [8105ba83] ? worker_thread+0x63/0x480
  [ 1080.363691]  [8105ba20] ? rescuer_thread+0x270/0x270
  [ 1080.363695]  [8105fc9e] ? kthread+0xce/0xf0
  [ 1080.363698]  [8106] ? queue_kthread_work+0x80/0x80
  [ 1080.363702]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
  [ 1080.363706]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
  [ 1080.363709]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
 
 The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
 looks a bit odd, but perhaps this is stack-tracing inaccuracy.
 
 Could you please send your .config?


#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 3.18.0-rc1 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT=elf64-x86-64
CONFIG_ARCH_DEFCONFIG=arch/x86/configs/x86_64_defconfig
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y

Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-24 Thread Paul E. McKenney
On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote:
 Config is below. 
 
   [  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 
   seconds.
   [  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df 
   #150
   [  960.346618] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
   disables this message.
   [  960.346621] kworker/u16:0   D  0 6  2 
   0x
   [  960.346633] Workqueue: netns cleanup_net
   [  960.346635]  880111c4d1e0 0002 880111c4d1e0 
   880111c53fd8
   [  960.346639]  880111c4d1e0 a000 880111c4d1e0 
   0296
   [  960.346642]   7fff 7fff 
   8182e010
   [  960.346646] Call Trace:
   [  960.346654]  [8153dcfd] ? schedule_timeout+0x18d/0x200
   [  960.346660]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
   [  960.346664]  [8153c544] ? wait_for_completion+0xb4/0x110
   [  960.346669]  [8153e5d5] ? 
   _raw_spin_unlock_irqrestore+0x5/0x10
   [  960.346672]  [8153c539] ? wait_for_completion+0xa9/0x110
   [  960.346677]  [81069450] ? wake_up_state+0x10/0x10
   [  960.346681]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
   [  960.346686]  [8145f13c] ? netdev_run_todo+0x5c/0x310
   [  960.346691]  [81458775] ? 
   rollback_registered_many+0x265/0x2d0
   [  960.346696]  [81459db6] ? 
   default_device_exit_batch+0x136/0x150
   [  960.346701]  [8107a240] ? __wake_up_sync+0x10/0x10
   [  960.346704]  [81454310] ? cleanup_net+0xf0/0x1d0
   [  960.346709]  [8105b566] ? process_one_work+0x136/0x380
   [  960.346713]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
   [  960.346716]  [8105ba83] ? worker_thread+0x63/0x480
   [  960.346720]  [8105ba20] ? rescuer_thread+0x270/0x270
   [  960.346723]  [8105fc9e] ? kthread+0xce/0xf0
   [  960.346726]  [8106] ? queue_kthread_work+0x80/0x80
   [  960.346729]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
   [  960.346734]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
   [  960.346737]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
   [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 
   seconds.
   [ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df 
   #150
   [ 1080.363591] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
   disables this message.
   [ 1080.363594] kworker/u16:0   D  0 6  2 
   0x
   [ 1080.363604] Workqueue: netns cleanup_net
   [ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
   880111c53fd8
   [ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
   0296
   [ 1080.363614]   7fff 7fff 
   8182e010
   [ 1080.363617] Call Trace:
   [ 1080.363625]  [8153dcfd] ? schedule_timeout+0x18d/0x200
   [ 1080.363633]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
   [ 1080.363637]  [8153c544] ? wait_for_completion+0xb4/0x110
   [ 1080.363641]  [8153e5d5] ? 
   _raw_spin_unlock_irqrestore+0x5/0x10
   [ 1080.363645]  [8153c539] ? wait_for_completion+0xa9/0x110
   [ 1080.363649]  [81069450] ? wake_up_state+0x10/0x10
   [ 1080.363653]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
   [ 1080.363658]  [8145f13c] ? netdev_run_todo+0x5c/0x310
   [ 1080.363663]  [81458775] ? 
   rollback_registered_many+0x265/0x2d0
   [ 1080.363667]  [81459db6] ? 
   default_device_exit_batch+0x136/0x150
   [ 1080.363672]  [8107a240] ? __wake_up_sync+0x10/0x10
   [ 1080.363676]  [81454310] ? cleanup_net+0xf0/0x1d0
   [ 1080.363681]  [8105b566] ? process_one_work+0x136/0x380
   [ 1080.363684]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
   [ 1080.363688]  [8105ba83] ? worker_thread+0x63/0x480
   [ 1080.363691]  [8105ba20] ? rescuer_thread+0x270/0x270
   [ 1080.363695]  [8105fc9e] ? kthread+0xce/0xf0
   [ 1080.363698]  [8106] ? queue_kthread_work+0x80/0x80
   [ 1080.363702]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
   [ 1080.363706]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
   [ 1080.363709]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
  
  The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
  looks a bit odd, but perhaps this is stack-tracing inaccuracy.
  
  Could you please send your .config?

This does look like something that a couple of other people are seeing.
Could you please check to see if reverting this commit makes the problem
go away?

35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs

This cannot be the full fix because this commit is itself a bug fix, but
it would be good to check whether you are really seeing what they are
seeing.  ;-)

 

Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-23 Thread Paul E. McKenney
On Thu, Oct 23, 2014 at 04:21:17PM +0300, Meelis Roos wrote:
> This is first real test on a computer where 3.17 did hang. Fist the hung 
> task info, then full dmesg.
> 
> [  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
> [  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
> [  960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  960.346621] kworker/u16:0   D  0 6  2 
> 0x
> [  960.346633] Workqueue: netns cleanup_net
> [  960.346635]  880111c4d1e0 0002 880111c4d1e0 
> 880111c53fd8
> [  960.346639]  880111c4d1e0 a000 880111c4d1e0 
> 0296
> [  960.346642]   7fff 7fff 
> 8182e010
> [  960.346646] Call Trace:
> [  960.346654]  [] ? schedule_timeout+0x18d/0x200
> [  960.346660]  [] ? get_state_synchronize_rcu+0x10/0x10
> [  960.346664]  [] ? wait_for_completion+0xb4/0x110
> [  960.346669]  [] ? _raw_spin_unlock_irqrestore+0x5/0x10
> [  960.346672]  [] ? wait_for_completion+0xa9/0x110
> [  960.346677]  [] ? wake_up_state+0x10/0x10
> [  960.346681]  [] ? _rcu_barrier+0x14a/0x1d0
> [  960.346686]  [] ? netdev_run_todo+0x5c/0x310
> [  960.346691]  [] ? rollback_registered_many+0x265/0x2d0
> [  960.346696]  [] ? default_device_exit_batch+0x136/0x150
> [  960.346701]  [] ? __wake_up_sync+0x10/0x10
> [  960.346704]  [] ? cleanup_net+0xf0/0x1d0
> [  960.346709]  [] ? process_one_work+0x136/0x380
> [  960.346713]  [] ? pwq_activate_delayed_work+0x27/0x40
> [  960.346716]  [] ? worker_thread+0x63/0x480
> [  960.346720]  [] ? rescuer_thread+0x270/0x270
> [  960.346723]  [] ? kthread+0xce/0xf0
> [  960.346726]  [] ? queue_kthread_work+0x80/0x80
> [  960.346729]  [] ? kthread_create_on_node+0x180/0x180
> [  960.346734]  [] ? ret_from_fork+0x7c/0xb0
> [  960.346737]  [] ? kthread_create_on_node+0x180/0x180
> [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
> [ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
> [ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 1080.363594] kworker/u16:0   D  0 6  2 
> 0x
> [ 1080.363604] Workqueue: netns cleanup_net
> [ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
> 880111c53fd8
> [ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
> 0296
> [ 1080.363614]   7fff 7fff 
> 8182e010
> [ 1080.363617] Call Trace:
> [ 1080.363625]  [] ? schedule_timeout+0x18d/0x200
> [ 1080.363633]  [] ? get_state_synchronize_rcu+0x10/0x10
> [ 1080.363637]  [] ? wait_for_completion+0xb4/0x110
> [ 1080.363641]  [] ? _raw_spin_unlock_irqrestore+0x5/0x10
> [ 1080.363645]  [] ? wait_for_completion+0xa9/0x110
> [ 1080.363649]  [] ? wake_up_state+0x10/0x10
> [ 1080.363653]  [] ? _rcu_barrier+0x14a/0x1d0
> [ 1080.363658]  [] ? netdev_run_todo+0x5c/0x310
> [ 1080.363663]  [] ? rollback_registered_many+0x265/0x2d0
> [ 1080.363667]  [] ? default_device_exit_batch+0x136/0x150
> [ 1080.363672]  [] ? __wake_up_sync+0x10/0x10
> [ 1080.363676]  [] ? cleanup_net+0xf0/0x1d0
> [ 1080.363681]  [] ? process_one_work+0x136/0x380
> [ 1080.363684]  [] ? pwq_activate_delayed_work+0x27/0x40
> [ 1080.363688]  [] ? worker_thread+0x63/0x480
> [ 1080.363691]  [] ? rescuer_thread+0x270/0x270
> [ 1080.363695]  [] ? kthread+0xce/0xf0
> [ 1080.363698]  [] ? queue_kthread_work+0x80/0x80
> [ 1080.363702]  [] ? kthread_create_on_node+0x180/0x180
> [ 1080.363706]  [] ? ret_from_fork+0x7c/0xb0
> [ 1080.363709]  [] ? kthread_create_on_node+0x180/0x180

The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
looks a bit odd, but perhaps this is stack-tracing inaccuracy.

Could you please send your .config?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-23 Thread Meelis Roos
This is first real test on a computer where 3.17 did hang. Fist the hung 
task info, then full dmesg.

[  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
[  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
[  960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  960.346621] kworker/u16:0   D  0 6  2 0x
[  960.346633] Workqueue: netns cleanup_net
[  960.346635]  880111c4d1e0 0002 880111c4d1e0 
880111c53fd8
[  960.346639]  880111c4d1e0 a000 880111c4d1e0 
0296
[  960.346642]   7fff 7fff 
8182e010
[  960.346646] Call Trace:
[  960.346654]  [] ? schedule_timeout+0x18d/0x200
[  960.346660]  [] ? get_state_synchronize_rcu+0x10/0x10
[  960.346664]  [] ? wait_for_completion+0xb4/0x110
[  960.346669]  [] ? _raw_spin_unlock_irqrestore+0x5/0x10
[  960.346672]  [] ? wait_for_completion+0xa9/0x110
[  960.346677]  [] ? wake_up_state+0x10/0x10
[  960.346681]  [] ? _rcu_barrier+0x14a/0x1d0
[  960.346686]  [] ? netdev_run_todo+0x5c/0x310
[  960.346691]  [] ? rollback_registered_many+0x265/0x2d0
[  960.346696]  [] ? default_device_exit_batch+0x136/0x150
[  960.346701]  [] ? __wake_up_sync+0x10/0x10
[  960.346704]  [] ? cleanup_net+0xf0/0x1d0
[  960.346709]  [] ? process_one_work+0x136/0x380
[  960.346713]  [] ? pwq_activate_delayed_work+0x27/0x40
[  960.346716]  [] ? worker_thread+0x63/0x480
[  960.346720]  [] ? rescuer_thread+0x270/0x270
[  960.346723]  [] ? kthread+0xce/0xf0
[  960.346726]  [] ? queue_kthread_work+0x80/0x80
[  960.346729]  [] ? kthread_create_on_node+0x180/0x180
[  960.346734]  [] ? ret_from_fork+0x7c/0xb0
[  960.346737]  [] ? kthread_create_on_node+0x180/0x180
[ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
[ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
[ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1080.363594] kworker/u16:0   D  0 6  2 0x
[ 1080.363604] Workqueue: netns cleanup_net
[ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
880111c53fd8
[ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
0296
[ 1080.363614]   7fff 7fff 
8182e010
[ 1080.363617] Call Trace:
[ 1080.363625]  [] ? schedule_timeout+0x18d/0x200
[ 1080.363633]  [] ? get_state_synchronize_rcu+0x10/0x10
[ 1080.363637]  [] ? wait_for_completion+0xb4/0x110
[ 1080.363641]  [] ? _raw_spin_unlock_irqrestore+0x5/0x10
[ 1080.363645]  [] ? wait_for_completion+0xa9/0x110
[ 1080.363649]  [] ? wake_up_state+0x10/0x10
[ 1080.363653]  [] ? _rcu_barrier+0x14a/0x1d0
[ 1080.363658]  [] ? netdev_run_todo+0x5c/0x310
[ 1080.363663]  [] ? rollback_registered_many+0x265/0x2d0
[ 1080.363667]  [] ? default_device_exit_batch+0x136/0x150
[ 1080.363672]  [] ? __wake_up_sync+0x10/0x10
[ 1080.363676]  [] ? cleanup_net+0xf0/0x1d0
[ 1080.363681]  [] ? process_one_work+0x136/0x380
[ 1080.363684]  [] ? pwq_activate_delayed_work+0x27/0x40
[ 1080.363688]  [] ? worker_thread+0x63/0x480
[ 1080.363691]  [] ? rescuer_thread+0x270/0x270
[ 1080.363695]  [] ? kthread+0xce/0xf0
[ 1080.363698]  [] ? queue_kthread_work+0x80/0x80
[ 1080.363702]  [] ? kthread_create_on_node+0x180/0x180
[ 1080.363706]  [] ? ret_from_fork+0x7c/0xb0
[ 1080.363709]  [] ? kthread_create_on_node+0x180/0x180




[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.18.0-rc1-00221-gc3351df (mroos@prometheus) (gcc 
version 4.9.1 (Debian 4.9.1-18) ) #150 SMP Thu Oct 23 14:21:19 EEST 2014
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.0-rc1-00221-gc3351df 
root=/dev/sda1 ro drm.debug=0xe
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable
[0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e8000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xdb7ac43f] usable
[0.00] BIOS-e820: [mem 0xdb7ac440-0xdb7ae49f] ACPI NVS
[0.00] BIOS-e820: [mem 0xdb7ae4a0-0xdfff] reserved
[0.00] BIOS-e820: [mem 0xf400-0xf7ff] reserved
[0.00] BIOS-e820: [mem 0xfe00-0xfed3] reserved
[0.00] BIOS-e820: [mem 0xfed45000-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x000117ff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.6 present.
[0.00] DMI: Hewlett-Packard HP Compaq 8100 Elite SFF PC/304Ah, BIOS 
786H1 v01.13 07/14/2011
[0.00] e820: update 

hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-23 Thread Meelis Roos
This is first real test on a computer where 3.17 did hang. Fist the hung 
task info, then full dmesg.

[  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
[  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
[  960.346618] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[  960.346621] kworker/u16:0   D  0 6  2 0x
[  960.346633] Workqueue: netns cleanup_net
[  960.346635]  880111c4d1e0 0002 880111c4d1e0 
880111c53fd8
[  960.346639]  880111c4d1e0 a000 880111c4d1e0 
0296
[  960.346642]   7fff 7fff 
8182e010
[  960.346646] Call Trace:
[  960.346654]  [8153dcfd] ? schedule_timeout+0x18d/0x200
[  960.346660]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
[  960.346664]  [8153c544] ? wait_for_completion+0xb4/0x110
[  960.346669]  [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10
[  960.346672]  [8153c539] ? wait_for_completion+0xa9/0x110
[  960.346677]  [81069450] ? wake_up_state+0x10/0x10
[  960.346681]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
[  960.346686]  [8145f13c] ? netdev_run_todo+0x5c/0x310
[  960.346691]  [81458775] ? rollback_registered_many+0x265/0x2d0
[  960.346696]  [81459db6] ? default_device_exit_batch+0x136/0x150
[  960.346701]  [8107a240] ? __wake_up_sync+0x10/0x10
[  960.346704]  [81454310] ? cleanup_net+0xf0/0x1d0
[  960.346709]  [8105b566] ? process_one_work+0x136/0x380
[  960.346713]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
[  960.346716]  [8105ba83] ? worker_thread+0x63/0x480
[  960.346720]  [8105ba20] ? rescuer_thread+0x270/0x270
[  960.346723]  [8105fc9e] ? kthread+0xce/0xf0
[  960.346726]  [8106] ? queue_kthread_work+0x80/0x80
[  960.346729]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
[  960.346734]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
[  960.346737]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
[ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
[ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
[ 1080.363591] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[ 1080.363594] kworker/u16:0   D  0 6  2 0x
[ 1080.363604] Workqueue: netns cleanup_net
[ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
880111c53fd8
[ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
0296
[ 1080.363614]   7fff 7fff 
8182e010
[ 1080.363617] Call Trace:
[ 1080.363625]  [8153dcfd] ? schedule_timeout+0x18d/0x200
[ 1080.363633]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
[ 1080.363637]  [8153c544] ? wait_for_completion+0xb4/0x110
[ 1080.363641]  [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10
[ 1080.363645]  [8153c539] ? wait_for_completion+0xa9/0x110
[ 1080.363649]  [81069450] ? wake_up_state+0x10/0x10
[ 1080.363653]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
[ 1080.363658]  [8145f13c] ? netdev_run_todo+0x5c/0x310
[ 1080.363663]  [81458775] ? rollback_registered_many+0x265/0x2d0
[ 1080.363667]  [81459db6] ? default_device_exit_batch+0x136/0x150
[ 1080.363672]  [8107a240] ? __wake_up_sync+0x10/0x10
[ 1080.363676]  [81454310] ? cleanup_net+0xf0/0x1d0
[ 1080.363681]  [8105b566] ? process_one_work+0x136/0x380
[ 1080.363684]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
[ 1080.363688]  [8105ba83] ? worker_thread+0x63/0x480
[ 1080.363691]  [8105ba20] ? rescuer_thread+0x270/0x270
[ 1080.363695]  [8105fc9e] ? kthread+0xce/0xf0
[ 1080.363698]  [8106] ? queue_kthread_work+0x80/0x80
[ 1080.363702]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
[ 1080.363706]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
[ 1080.363709]  [8105fbd0] ? kthread_create_on_node+0x180/0x180




[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.18.0-rc1-00221-gc3351df (mroos@prometheus) (gcc 
version 4.9.1 (Debian 4.9.1-18) ) #150 SMP Thu Oct 23 14:21:19 EEST 2014
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.0-rc1-00221-gc3351df 
root=/dev/sda1 ro drm.debug=0xe
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable
[0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e8000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xdb7ac43f] usable
[0.00] BIOS-e820: 

Re: hung tasks in 3.18.0-rc1-00221-gc3351df

2014-10-23 Thread Paul E. McKenney
On Thu, Oct 23, 2014 at 04:21:17PM +0300, Meelis Roos wrote:
 This is first real test on a computer where 3.17 did hang. Fist the hung 
 task info, then full dmesg.
 
 [  960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
 [  960.346616]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
 [  960.346618] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [  960.346621] kworker/u16:0   D  0 6  2 
 0x
 [  960.346633] Workqueue: netns cleanup_net
 [  960.346635]  880111c4d1e0 0002 880111c4d1e0 
 880111c53fd8
 [  960.346639]  880111c4d1e0 a000 880111c4d1e0 
 0296
 [  960.346642]   7fff 7fff 
 8182e010
 [  960.346646] Call Trace:
 [  960.346654]  [8153dcfd] ? schedule_timeout+0x18d/0x200
 [  960.346660]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
 [  960.346664]  [8153c544] ? wait_for_completion+0xb4/0x110
 [  960.346669]  [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10
 [  960.346672]  [8153c539] ? wait_for_completion+0xa9/0x110
 [  960.346677]  [81069450] ? wake_up_state+0x10/0x10
 [  960.346681]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
 [  960.346686]  [8145f13c] ? netdev_run_todo+0x5c/0x310
 [  960.346691]  [81458775] ? rollback_registered_many+0x265/0x2d0
 [  960.346696]  [81459db6] ? default_device_exit_batch+0x136/0x150
 [  960.346701]  [8107a240] ? __wake_up_sync+0x10/0x10
 [  960.346704]  [81454310] ? cleanup_net+0xf0/0x1d0
 [  960.346709]  [8105b566] ? process_one_work+0x136/0x380
 [  960.346713]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
 [  960.346716]  [8105ba83] ? worker_thread+0x63/0x480
 [  960.346720]  [8105ba20] ? rescuer_thread+0x270/0x270
 [  960.346723]  [8105fc9e] ? kthread+0xce/0xf0
 [  960.346726]  [8106] ? queue_kthread_work+0x80/0x80
 [  960.346729]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
 [  960.346734]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
 [  960.346737]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
 [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds.
 [ 1080.363589]   Tainted: GW  3.18.0-rc1-00221-gc3351df #150
 [ 1080.363591] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [ 1080.363594] kworker/u16:0   D  0 6  2 
 0x
 [ 1080.363604] Workqueue: netns cleanup_net
 [ 1080.363607]  880111c4d1e0 0002 880111c4d1e0 
 880111c53fd8
 [ 1080.363610]  880111c4d1e0 a000 880111c4d1e0 
 0296
 [ 1080.363614]   7fff 7fff 
 8182e010
 [ 1080.363617] Call Trace:
 [ 1080.363625]  [8153dcfd] ? schedule_timeout+0x18d/0x200
 [ 1080.363633]  [810915a0] ? get_state_synchronize_rcu+0x10/0x10
 [ 1080.363637]  [8153c544] ? wait_for_completion+0xb4/0x110
 [ 1080.363641]  [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10
 [ 1080.363645]  [8153c539] ? wait_for_completion+0xa9/0x110
 [ 1080.363649]  [81069450] ? wake_up_state+0x10/0x10
 [ 1080.363653]  [8109285a] ? _rcu_barrier+0x14a/0x1d0
 [ 1080.363658]  [8145f13c] ? netdev_run_todo+0x5c/0x310
 [ 1080.363663]  [81458775] ? rollback_registered_many+0x265/0x2d0
 [ 1080.363667]  [81459db6] ? default_device_exit_batch+0x136/0x150
 [ 1080.363672]  [8107a240] ? __wake_up_sync+0x10/0x10
 [ 1080.363676]  [81454310] ? cleanup_net+0xf0/0x1d0
 [ 1080.363681]  [8105b566] ? process_one_work+0x136/0x380
 [ 1080.363684]  [81058ca7] ? pwq_activate_delayed_work+0x27/0x40
 [ 1080.363688]  [8105ba83] ? worker_thread+0x63/0x480
 [ 1080.363691]  [8105ba20] ? rescuer_thread+0x270/0x270
 [ 1080.363695]  [8105fc9e] ? kthread+0xce/0xf0
 [ 1080.363698]  [8106] ? queue_kthread_work+0x80/0x80
 [ 1080.363702]  [8105fbd0] ? kthread_create_on_node+0x180/0x180
 [ 1080.363706]  [8153ed2c] ? ret_from_fork+0x7c/0xb0
 [ 1080.363709]  [8105fbd0] ? kthread_create_on_node+0x180/0x180

The get_state_synchronize_rcu() from within the wait for _rcu_barrier()
looks a bit odd, but perhaps this is stack-tracing inaccuracy.

Could you please send your .config?

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/