Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Tue, Oct 28, 2014 at 09:48:21PM +0200, Meelis Roos wrote: > > > > rcu: Make rcu_barrier() understand about missing rcuo kthreads > > > > > > I can not trigger the previous problem at will so my testing is not very > > > useful. I tested the previous patch that you sent but the problem did > > > not trigger, and yesterday applied this one and it still does not > > > trigger, but I an using remote sessions only for now (no desktop). > > > > > > At least it's not clearly broken :) > > > > A couple of other people had more reproducible test setups and confirmed > > that it fixed them. > > > > But yes, there are no doubt other RCU bugs waiting to be found. ;-) > > I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium > startup failing, and your patch fixes that. > > Tested-by: Meelis Roos Got it, thank you!!! Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
> > > rcu: Make rcu_barrier() understand about missing rcuo kthreads > > > > I can not trigger the previous problem at will so my testing is not very > > useful. I tested the previous patch that you sent but the problem did > > not trigger, and yesterday applied this one and it still does not > > trigger, but I an using remote sessions only for now (no desktop). > > > > At least it's not clearly broken :) > > A couple of other people had more reproducible test setups and confirmed > that it fixed them. > > But yes, there are no doubt other RCU bugs waiting to be found. ;-) I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium startup failing, and your patch fixes that. Tested-by: Meelis Roos -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Tue, Oct 28, 2014 at 10:16:43AM +0200, Meelis Roos wrote: > > rcu: Make rcu_barrier() understand about missing rcuo kthreads > > I can not trigger the previous problem at will so my testing is not very > useful. I tested the previous patch that you sent but the problem did > not trigger, and yesterday applied this one and it still does not > trigger, but I an using remote sessions only for now (no desktop). > > At least it's not clearly broken :) A couple of other people had more reproducible test setups and confirmed that it fixed them. But yes, there are no doubt other RCU bugs waiting to be found. ;-) Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
> rcu: Make rcu_barrier() understand about missing rcuo kthreads I can not trigger the previous problem at will so my testing is not very useful. I tested the previous patch that you sent but the problem did not trigger, and yesterday applied this one and it still does not trigger, but I an using remote sessions only for now (no desktop). At least it's not clearly broken :) -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
rcu: Make rcu_barrier() understand about missing rcuo kthreads I can not trigger the previous problem at will so my testing is not very useful. I tested the previous patch that you sent but the problem did not trigger, and yesterday applied this one and it still does not trigger, but I an using remote sessions only for now (no desktop). At least it's not clearly broken :) -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Tue, Oct 28, 2014 at 10:16:43AM +0200, Meelis Roos wrote: rcu: Make rcu_barrier() understand about missing rcuo kthreads I can not trigger the previous problem at will so my testing is not very useful. I tested the previous patch that you sent but the problem did not trigger, and yesterday applied this one and it still does not trigger, but I an using remote sessions only for now (no desktop). At least it's not clearly broken :) A couple of other people had more reproducible test setups and confirmed that it fixed them. But yes, there are no doubt other RCU bugs waiting to be found. ;-) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
rcu: Make rcu_barrier() understand about missing rcuo kthreads I can not trigger the previous problem at will so my testing is not very useful. I tested the previous patch that you sent but the problem did not trigger, and yesterday applied this one and it still does not trigger, but I an using remote sessions only for now (no desktop). At least it's not clearly broken :) A couple of other people had more reproducible test setups and confirmed that it fixed them. But yes, there are no doubt other RCU bugs waiting to be found. ;-) I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium startup failing, and your patch fixes that. Tested-by: Meelis Roos mr...@linux.ee -- Meelis Roos (mr...@linux.ee) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Tue, Oct 28, 2014 at 09:48:21PM +0200, Meelis Roos wrote: rcu: Make rcu_barrier() understand about missing rcuo kthreads I can not trigger the previous problem at will so my testing is not very useful. I tested the previous patch that you sent but the problem did not trigger, and yesterday applied this one and it still does not trigger, but I an using remote sessions only for now (no desktop). At least it's not clearly broken :) A couple of other people had more reproducible test setups and confirmed that it fixed them. But yes, there are no doubt other RCU bugs waiting to be found. ;-) I reproduced the problem on Thinkpad T400 with Core2Duo and Chromium startup failing, and your patch fixes that. Tested-by: Meelis Roos mr...@linux.ee Got it, thank you!!! Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Fri, Oct 24, 2014 at 07:55:09AM -0700, Paul E. McKenney wrote: > On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote: [ . . . ] > > > The get_state_synchronize_rcu() from within the wait for _rcu_barrier() > > > looks a bit odd, but perhaps this is stack-tracing inaccuracy. > > > > > > Could you please send your .config? > > This does look like something that a couple of other people are seeing. > Could you please check to see if reverting this commit makes the problem > go away? > > 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs > > This cannot be the full fix because this commit is itself a bug fix, but > it would be good to check whether you are really seeing what they are > seeing. ;-) rcu: Make rcu_barrier() understand about missing rcuo kthreads Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit 35ce7f29a44a, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti Reported-by: Jay Vosburgh Reported-by: Meelis Roos Reported-by: Eric B Munson Signed-off-by: Paul E. McKenney diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index aa8e5eea3ab4..c78e88ce5ea3 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read, /* * Tracepoint for _rcu_barrier() execution. The string "s" describes * the _rcu_barrier phase: - * "Begin": rcu_barrier_callback() started. - * "Check": rcu_barrier_callback() checking for piggybacking. - * "EarlyExit": rcu_barrier_callback() piggybacked, thus early exit. - * "Inc1": rcu_barrier_callback() piggyback check counter incremented. - * "Offline": rcu_barrier_callback() found offline CPU - * "OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU. - * "OnlineQ": rcu_barrier_callback() found online CPU with callbacks. - * "OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks. + * "Begin": _rcu_barrier() started. + * "Check": _rcu_barrier() checking for piggybacking. + * "EarlyExit": _rcu_barrier() piggybacked, thus early exit. + * "Inc1": _rcu_barrier() piggyback check counter incremented. + * "OfflineNoCB": _rcu_barrier() found callback on never-online CPU + * "OnlineNoCB": _rcu_barrier() found online no-CBs CPU. + * "OnlineQ": _rcu_barrier() found online CPU with callbacks. + * "OnlineNQ": _rcu_barrier() found online CPU, no callbacks. * "IRQ": An rcu_barrier_callback() callback posted on remote CPU. * "CB": An rcu_barrier_callback() invoked a callback, not the last. * "LastCB": An rcu_barrier_callback() invoked the last callback. - * "Inc2": rcu_barrier_callback() piggyback check counter incremented. + * "Inc2": _rcu_barrier() piggyback check counter incremented. * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument * is the count of remaining callbacks, and "done" is the piggybacking count. */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index f6880052b917..7680fc275036 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3312,11
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Fri, Oct 24, 2014 at 07:55:09AM -0700, Paul E. McKenney wrote: On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote: [ . . . ] The get_state_synchronize_rcu() from within the wait for _rcu_barrier() looks a bit odd, but perhaps this is stack-tracing inaccuracy. Could you please send your .config? This does look like something that a couple of other people are seeing. Could you please check to see if reverting this commit makes the problem go away? 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs This cannot be the full fix because this commit is itself a bug fix, but it would be good to check whether you are really seeing what they are seeing. ;-) rcu: Make rcu_barrier() understand about missing rcuo kthreads Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit 35ce7f29a44a, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti yan...@declera.com Reported-by: Jay Vosburgh jay.vosbu...@canonical.com Reported-by: Meelis Roos mr...@linux.ee Reported-by: Eric B Munson emun...@akamai.com Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index aa8e5eea3ab4..c78e88ce5ea3 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read, /* * Tracepoint for _rcu_barrier() execution. The string s describes * the _rcu_barrier phase: - * Begin: rcu_barrier_callback() started. - * Check: rcu_barrier_callback() checking for piggybacking. - * EarlyExit: rcu_barrier_callback() piggybacked, thus early exit. - * Inc1: rcu_barrier_callback() piggyback check counter incremented. - * Offline: rcu_barrier_callback() found offline CPU - * OnlineNoCB: rcu_barrier_callback() found online no-CBs CPU. - * OnlineQ: rcu_barrier_callback() found online CPU with callbacks. - * OnlineNQ: rcu_barrier_callback() found online CPU, no callbacks. + * Begin: _rcu_barrier() started. + * Check: _rcu_barrier() checking for piggybacking. + * EarlyExit: _rcu_barrier() piggybacked, thus early exit. + * Inc1: _rcu_barrier() piggyback check counter incremented. + * OfflineNoCB: _rcu_barrier() found callback on never-online CPU + * OnlineNoCB: _rcu_barrier() found online no-CBs CPU. + * OnlineQ: _rcu_barrier() found online CPU with callbacks. + * OnlineNQ: _rcu_barrier() found online CPU, no callbacks. * IRQ: An rcu_barrier_callback() callback posted on remote CPU. * CB: An rcu_barrier_callback() invoked a callback, not the last. * LastCB: An rcu_barrier_callback() invoked the last callback. - * Inc2: rcu_barrier_callback() piggyback check counter incremented. + * Inc2: _rcu_barrier() piggyback check counter incremented. * The cpu argument is the CPU or -1 if meaningless, the cnt argument * is the count of remaining callbacks, and done is the piggybacking count. */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index f6880052b917..7680fc275036 100644 --- a/kernel/rcu/tree.c +++
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote: > Config is below. > > > > [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 > > > seconds. > > > [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df > > > #150 > > > [ 960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [ 960.346621] kworker/u16:0 D 0 6 2 > > > 0x > > > [ 960.346633] Workqueue: netns cleanup_net > > > [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 > > > 880111c53fd8 > > > [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 > > > 0296 > > > [ 960.346642] 7fff 7fff > > > 8182e010 > > > [ 960.346646] Call Trace: > > > [ 960.346654] [] ? schedule_timeout+0x18d/0x200 > > > [ 960.346660] [] ? get_state_synchronize_rcu+0x10/0x10 > > > [ 960.346664] [] ? wait_for_completion+0xb4/0x110 > > > [ 960.346669] [] ? > > > _raw_spin_unlock_irqrestore+0x5/0x10 > > > [ 960.346672] [] ? wait_for_completion+0xa9/0x110 > > > [ 960.346677] [] ? wake_up_state+0x10/0x10 > > > [ 960.346681] [] ? _rcu_barrier+0x14a/0x1d0 > > > [ 960.346686] [] ? netdev_run_todo+0x5c/0x310 > > > [ 960.346691] [] ? > > > rollback_registered_many+0x265/0x2d0 > > > [ 960.346696] [] ? > > > default_device_exit_batch+0x136/0x150 > > > [ 960.346701] [] ? __wake_up_sync+0x10/0x10 > > > [ 960.346704] [] ? cleanup_net+0xf0/0x1d0 > > > [ 960.346709] [] ? process_one_work+0x136/0x380 > > > [ 960.346713] [] ? pwq_activate_delayed_work+0x27/0x40 > > > [ 960.346716] [] ? worker_thread+0x63/0x480 > > > [ 960.346720] [] ? rescuer_thread+0x270/0x270 > > > [ 960.346723] [] ? kthread+0xce/0xf0 > > > [ 960.346726] [] ? queue_kthread_work+0x80/0x80 > > > [ 960.346729] [] ? kthread_create_on_node+0x180/0x180 > > > [ 960.346734] [] ? ret_from_fork+0x7c/0xb0 > > > [ 960.346737] [] ? kthread_create_on_node+0x180/0x180 > > > [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 > > > seconds. > > > [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df > > > #150 > > > [ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > disables this message. > > > [ 1080.363594] kworker/u16:0 D 0 6 2 > > > 0x > > > [ 1080.363604] Workqueue: netns cleanup_net > > > [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 > > > 880111c53fd8 > > > [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 > > > 0296 > > > [ 1080.363614] 7fff 7fff > > > 8182e010 > > > [ 1080.363617] Call Trace: > > > [ 1080.363625] [] ? schedule_timeout+0x18d/0x200 > > > [ 1080.363633] [] ? get_state_synchronize_rcu+0x10/0x10 > > > [ 1080.363637] [] ? wait_for_completion+0xb4/0x110 > > > [ 1080.363641] [] ? > > > _raw_spin_unlock_irqrestore+0x5/0x10 > > > [ 1080.363645] [] ? wait_for_completion+0xa9/0x110 > > > [ 1080.363649] [] ? wake_up_state+0x10/0x10 > > > [ 1080.363653] [] ? _rcu_barrier+0x14a/0x1d0 > > > [ 1080.363658] [] ? netdev_run_todo+0x5c/0x310 > > > [ 1080.363663] [] ? > > > rollback_registered_many+0x265/0x2d0 > > > [ 1080.363667] [] ? > > > default_device_exit_batch+0x136/0x150 > > > [ 1080.363672] [] ? __wake_up_sync+0x10/0x10 > > > [ 1080.363676] [] ? cleanup_net+0xf0/0x1d0 > > > [ 1080.363681] [] ? process_one_work+0x136/0x380 > > > [ 1080.363684] [] ? pwq_activate_delayed_work+0x27/0x40 > > > [ 1080.363688] [] ? worker_thread+0x63/0x480 > > > [ 1080.363691] [] ? rescuer_thread+0x270/0x270 > > > [ 1080.363695] [] ? kthread+0xce/0xf0 > > > [ 1080.363698] [] ? queue_kthread_work+0x80/0x80 > > > [ 1080.363702] [] ? kthread_create_on_node+0x180/0x180 > > > [ 1080.363706] [] ? ret_from_fork+0x7c/0xb0 > > > [ 1080.363709] [] ? kthread_create_on_node+0x180/0x180 > > > > The get_state_synchronize_rcu() from within the wait for _rcu_barrier() > > looks a bit odd, but perhaps this is stack-tracing inaccuracy. > > > > Could you please send your .config? This does look like something that a couple of other people are seeing. Could you please check to see if reverting this commit makes the problem go away? 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs This cannot be the full fix because this commit is itself a bug fix, but it would be good to check whether you are really seeing what they are seeing. ;-) Thanx, Paul > # > # Automatically generated file; DO NOT EDIT. > # Linux/x86 3.18.0-rc1 Kernel Configuration > # > CONFIG_64BIT=y > CONFIG_X86_64=y > CONFIG_X86=y > CONFIG_INSTRUCTION_DECODER=y > CONFIG_OUTPUT_FORMAT="elf64-x86-64" > CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" > CONFIG_LOCKDEP_SUPPORT=y > CONFIG_STACKTRACE_SUPPORT=y >
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
Config is below. > > [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. > > [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 > > [ 960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 960.346621] kworker/u16:0 D 0 6 2 > > 0x > > [ 960.346633] Workqueue: netns cleanup_net > > [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 > > 880111c53fd8 > > [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 > > 0296 > > [ 960.346642] 7fff 7fff > > 8182e010 > > [ 960.346646] Call Trace: > > [ 960.346654] [] ? schedule_timeout+0x18d/0x200 > > [ 960.346660] [] ? get_state_synchronize_rcu+0x10/0x10 > > [ 960.346664] [] ? wait_for_completion+0xb4/0x110 > > [ 960.346669] [] ? _raw_spin_unlock_irqrestore+0x5/0x10 > > [ 960.346672] [] ? wait_for_completion+0xa9/0x110 > > [ 960.346677] [] ? wake_up_state+0x10/0x10 > > [ 960.346681] [] ? _rcu_barrier+0x14a/0x1d0 > > [ 960.346686] [] ? netdev_run_todo+0x5c/0x310 > > [ 960.346691] [] ? rollback_registered_many+0x265/0x2d0 > > [ 960.346696] [] ? default_device_exit_batch+0x136/0x150 > > [ 960.346701] [] ? __wake_up_sync+0x10/0x10 > > [ 960.346704] [] ? cleanup_net+0xf0/0x1d0 > > [ 960.346709] [] ? process_one_work+0x136/0x380 > > [ 960.346713] [] ? pwq_activate_delayed_work+0x27/0x40 > > [ 960.346716] [] ? worker_thread+0x63/0x480 > > [ 960.346720] [] ? rescuer_thread+0x270/0x270 > > [ 960.346723] [] ? kthread+0xce/0xf0 > > [ 960.346726] [] ? queue_kthread_work+0x80/0x80 > > [ 960.346729] [] ? kthread_create_on_node+0x180/0x180 > > [ 960.346734] [] ? ret_from_fork+0x7c/0xb0 > > [ 960.346737] [] ? kthread_create_on_node+0x180/0x180 > > [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. > > [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 > > [ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 1080.363594] kworker/u16:0 D 0 6 2 > > 0x > > [ 1080.363604] Workqueue: netns cleanup_net > > [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 > > 880111c53fd8 > > [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 > > 0296 > > [ 1080.363614] 7fff 7fff > > 8182e010 > > [ 1080.363617] Call Trace: > > [ 1080.363625] [] ? schedule_timeout+0x18d/0x200 > > [ 1080.363633] [] ? get_state_synchronize_rcu+0x10/0x10 > > [ 1080.363637] [] ? wait_for_completion+0xb4/0x110 > > [ 1080.363641] [] ? _raw_spin_unlock_irqrestore+0x5/0x10 > > [ 1080.363645] [] ? wait_for_completion+0xa9/0x110 > > [ 1080.363649] [] ? wake_up_state+0x10/0x10 > > [ 1080.363653] [] ? _rcu_barrier+0x14a/0x1d0 > > [ 1080.363658] [] ? netdev_run_todo+0x5c/0x310 > > [ 1080.363663] [] ? rollback_registered_many+0x265/0x2d0 > > [ 1080.363667] [] ? default_device_exit_batch+0x136/0x150 > > [ 1080.363672] [] ? __wake_up_sync+0x10/0x10 > > [ 1080.363676] [] ? cleanup_net+0xf0/0x1d0 > > [ 1080.363681] [] ? process_one_work+0x136/0x380 > > [ 1080.363684] [] ? pwq_activate_delayed_work+0x27/0x40 > > [ 1080.363688] [] ? worker_thread+0x63/0x480 > > [ 1080.363691] [] ? rescuer_thread+0x270/0x270 > > [ 1080.363695] [] ? kthread+0xce/0xf0 > > [ 1080.363698] [] ? queue_kthread_work+0x80/0x80 > > [ 1080.363702] [] ? kthread_create_on_node+0x180/0x180 > > [ 1080.363706] [] ? ret_from_fork+0x7c/0xb0 > > [ 1080.363709] [] ? kthread_create_on_node+0x180/0x180 > > The get_state_synchronize_rcu() from within the wait for _rcu_barrier() > looks a bit odd, but perhaps this is stack-tracing inaccuracy. > > Could you please send your .config? # # Automatically generated file; DO NOT EDIT. # Linux/x86 3.18.0-rc1 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_HAVE_INTEL_TXT=y
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
Config is below. [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 960.346618] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 960.346621] kworker/u16:0 D 0 6 2 0x [ 960.346633] Workqueue: netns cleanup_net [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 0296 [ 960.346642] 7fff 7fff 8182e010 [ 960.346646] Call Trace: [ 960.346654] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 960.346660] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 960.346664] [8153c544] ? wait_for_completion+0xb4/0x110 [ 960.346669] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 960.346672] [8153c539] ? wait_for_completion+0xa9/0x110 [ 960.346677] [81069450] ? wake_up_state+0x10/0x10 [ 960.346681] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 960.346686] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 960.346691] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 960.346696] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 960.346701] [8107a240] ? __wake_up_sync+0x10/0x10 [ 960.346704] [81454310] ? cleanup_net+0xf0/0x1d0 [ 960.346709] [8105b566] ? process_one_work+0x136/0x380 [ 960.346713] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 960.346716] [8105ba83] ? worker_thread+0x63/0x480 [ 960.346720] [8105ba20] ? rescuer_thread+0x270/0x270 [ 960.346723] [8105fc9e] ? kthread+0xce/0xf0 [ 960.346726] [8106] ? queue_kthread_work+0x80/0x80 [ 960.346729] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 960.346734] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 960.346737] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 1080.363591] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 1080.363594] kworker/u16:0 D 0 6 2 0x [ 1080.363604] Workqueue: netns cleanup_net [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 0296 [ 1080.363614] 7fff 7fff 8182e010 [ 1080.363617] Call Trace: [ 1080.363625] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 1080.363633] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 1080.363637] [8153c544] ? wait_for_completion+0xb4/0x110 [ 1080.363641] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 1080.363645] [8153c539] ? wait_for_completion+0xa9/0x110 [ 1080.363649] [81069450] ? wake_up_state+0x10/0x10 [ 1080.363653] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 1080.363658] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 1080.363663] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 1080.363667] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 1080.363672] [8107a240] ? __wake_up_sync+0x10/0x10 [ 1080.363676] [81454310] ? cleanup_net+0xf0/0x1d0 [ 1080.363681] [8105b566] ? process_one_work+0x136/0x380 [ 1080.363684] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 1080.363688] [8105ba83] ? worker_thread+0x63/0x480 [ 1080.363691] [8105ba20] ? rescuer_thread+0x270/0x270 [ 1080.363695] [8105fc9e] ? kthread+0xce/0xf0 [ 1080.363698] [8106] ? queue_kthread_work+0x80/0x80 [ 1080.363702] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363706] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 1080.363709] [8105fbd0] ? kthread_create_on_node+0x180/0x180 The get_state_synchronize_rcu() from within the wait for _rcu_barrier() looks a bit odd, but perhaps this is stack-tracing inaccuracy. Could you please send your .config? # # Automatically generated file; DO NOT EDIT. # Linux/x86 3.18.0-rc1 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT=elf64-x86-64 CONFIG_ARCH_DEFCONFIG=arch/x86/configs/x86_64_defconfig CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Fri, Oct 24, 2014 at 10:58:17AM +0300, Meelis Roos wrote: Config is below. [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 960.346618] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 960.346621] kworker/u16:0 D 0 6 2 0x [ 960.346633] Workqueue: netns cleanup_net [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 0296 [ 960.346642] 7fff 7fff 8182e010 [ 960.346646] Call Trace: [ 960.346654] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 960.346660] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 960.346664] [8153c544] ? wait_for_completion+0xb4/0x110 [ 960.346669] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 960.346672] [8153c539] ? wait_for_completion+0xa9/0x110 [ 960.346677] [81069450] ? wake_up_state+0x10/0x10 [ 960.346681] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 960.346686] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 960.346691] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 960.346696] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 960.346701] [8107a240] ? __wake_up_sync+0x10/0x10 [ 960.346704] [81454310] ? cleanup_net+0xf0/0x1d0 [ 960.346709] [8105b566] ? process_one_work+0x136/0x380 [ 960.346713] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 960.346716] [8105ba83] ? worker_thread+0x63/0x480 [ 960.346720] [8105ba20] ? rescuer_thread+0x270/0x270 [ 960.346723] [8105fc9e] ? kthread+0xce/0xf0 [ 960.346726] [8106] ? queue_kthread_work+0x80/0x80 [ 960.346729] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 960.346734] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 960.346737] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 1080.363591] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 1080.363594] kworker/u16:0 D 0 6 2 0x [ 1080.363604] Workqueue: netns cleanup_net [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 0296 [ 1080.363614] 7fff 7fff 8182e010 [ 1080.363617] Call Trace: [ 1080.363625] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 1080.363633] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 1080.363637] [8153c544] ? wait_for_completion+0xb4/0x110 [ 1080.363641] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 1080.363645] [8153c539] ? wait_for_completion+0xa9/0x110 [ 1080.363649] [81069450] ? wake_up_state+0x10/0x10 [ 1080.363653] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 1080.363658] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 1080.363663] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 1080.363667] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 1080.363672] [8107a240] ? __wake_up_sync+0x10/0x10 [ 1080.363676] [81454310] ? cleanup_net+0xf0/0x1d0 [ 1080.363681] [8105b566] ? process_one_work+0x136/0x380 [ 1080.363684] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 1080.363688] [8105ba83] ? worker_thread+0x63/0x480 [ 1080.363691] [8105ba20] ? rescuer_thread+0x270/0x270 [ 1080.363695] [8105fc9e] ? kthread+0xce/0xf0 [ 1080.363698] [8106] ? queue_kthread_work+0x80/0x80 [ 1080.363702] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363706] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 1080.363709] [8105fbd0] ? kthread_create_on_node+0x180/0x180 The get_state_synchronize_rcu() from within the wait for _rcu_barrier() looks a bit odd, but perhaps this is stack-tracing inaccuracy. Could you please send your .config? This does look like something that a couple of other people are seeing. Could you please check to see if reverting this commit makes the problem go away? 35ce7f29a44a rcu: Create rcuo kthreads only for onlined CPUs This cannot be the full fix because this commit is itself a bug fix, but it would be good to check whether you are really seeing what they are seeing. ;-)
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Thu, Oct 23, 2014 at 04:21:17PM +0300, Meelis Roos wrote: > This is first real test on a computer where 3.17 did hang. Fist the hung > task info, then full dmesg. > > [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. > [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 > [ 960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 960.346621] kworker/u16:0 D 0 6 2 > 0x > [ 960.346633] Workqueue: netns cleanup_net > [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 > 880111c53fd8 > [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 > 0296 > [ 960.346642] 7fff 7fff > 8182e010 > [ 960.346646] Call Trace: > [ 960.346654] [] ? schedule_timeout+0x18d/0x200 > [ 960.346660] [] ? get_state_synchronize_rcu+0x10/0x10 > [ 960.346664] [] ? wait_for_completion+0xb4/0x110 > [ 960.346669] [] ? _raw_spin_unlock_irqrestore+0x5/0x10 > [ 960.346672] [] ? wait_for_completion+0xa9/0x110 > [ 960.346677] [] ? wake_up_state+0x10/0x10 > [ 960.346681] [] ? _rcu_barrier+0x14a/0x1d0 > [ 960.346686] [] ? netdev_run_todo+0x5c/0x310 > [ 960.346691] [] ? rollback_registered_many+0x265/0x2d0 > [ 960.346696] [] ? default_device_exit_batch+0x136/0x150 > [ 960.346701] [] ? __wake_up_sync+0x10/0x10 > [ 960.346704] [] ? cleanup_net+0xf0/0x1d0 > [ 960.346709] [] ? process_one_work+0x136/0x380 > [ 960.346713] [] ? pwq_activate_delayed_work+0x27/0x40 > [ 960.346716] [] ? worker_thread+0x63/0x480 > [ 960.346720] [] ? rescuer_thread+0x270/0x270 > [ 960.346723] [] ? kthread+0xce/0xf0 > [ 960.346726] [] ? queue_kthread_work+0x80/0x80 > [ 960.346729] [] ? kthread_create_on_node+0x180/0x180 > [ 960.346734] [] ? ret_from_fork+0x7c/0xb0 > [ 960.346737] [] ? kthread_create_on_node+0x180/0x180 > [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. > [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 > [ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 1080.363594] kworker/u16:0 D 0 6 2 > 0x > [ 1080.363604] Workqueue: netns cleanup_net > [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 > 880111c53fd8 > [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 > 0296 > [ 1080.363614] 7fff 7fff > 8182e010 > [ 1080.363617] Call Trace: > [ 1080.363625] [] ? schedule_timeout+0x18d/0x200 > [ 1080.363633] [] ? get_state_synchronize_rcu+0x10/0x10 > [ 1080.363637] [] ? wait_for_completion+0xb4/0x110 > [ 1080.363641] [] ? _raw_spin_unlock_irqrestore+0x5/0x10 > [ 1080.363645] [] ? wait_for_completion+0xa9/0x110 > [ 1080.363649] [] ? wake_up_state+0x10/0x10 > [ 1080.363653] [] ? _rcu_barrier+0x14a/0x1d0 > [ 1080.363658] [] ? netdev_run_todo+0x5c/0x310 > [ 1080.363663] [] ? rollback_registered_many+0x265/0x2d0 > [ 1080.363667] [] ? default_device_exit_batch+0x136/0x150 > [ 1080.363672] [] ? __wake_up_sync+0x10/0x10 > [ 1080.363676] [] ? cleanup_net+0xf0/0x1d0 > [ 1080.363681] [] ? process_one_work+0x136/0x380 > [ 1080.363684] [] ? pwq_activate_delayed_work+0x27/0x40 > [ 1080.363688] [] ? worker_thread+0x63/0x480 > [ 1080.363691] [] ? rescuer_thread+0x270/0x270 > [ 1080.363695] [] ? kthread+0xce/0xf0 > [ 1080.363698] [] ? queue_kthread_work+0x80/0x80 > [ 1080.363702] [] ? kthread_create_on_node+0x180/0x180 > [ 1080.363706] [] ? ret_from_fork+0x7c/0xb0 > [ 1080.363709] [] ? kthread_create_on_node+0x180/0x180 The get_state_synchronize_rcu() from within the wait for _rcu_barrier() looks a bit odd, but perhaps this is stack-tracing inaccuracy. Could you please send your .config? Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
hung tasks in 3.18.0-rc1-00221-gc3351df
This is first real test on a computer where 3.17 did hang. Fist the hung task info, then full dmesg. [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 960.346618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 960.346621] kworker/u16:0 D 0 6 2 0x [ 960.346633] Workqueue: netns cleanup_net [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 0296 [ 960.346642] 7fff 7fff 8182e010 [ 960.346646] Call Trace: [ 960.346654] [] ? schedule_timeout+0x18d/0x200 [ 960.346660] [] ? get_state_synchronize_rcu+0x10/0x10 [ 960.346664] [] ? wait_for_completion+0xb4/0x110 [ 960.346669] [] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 960.346672] [] ? wait_for_completion+0xa9/0x110 [ 960.346677] [] ? wake_up_state+0x10/0x10 [ 960.346681] [] ? _rcu_barrier+0x14a/0x1d0 [ 960.346686] [] ? netdev_run_todo+0x5c/0x310 [ 960.346691] [] ? rollback_registered_many+0x265/0x2d0 [ 960.346696] [] ? default_device_exit_batch+0x136/0x150 [ 960.346701] [] ? __wake_up_sync+0x10/0x10 [ 960.346704] [] ? cleanup_net+0xf0/0x1d0 [ 960.346709] [] ? process_one_work+0x136/0x380 [ 960.346713] [] ? pwq_activate_delayed_work+0x27/0x40 [ 960.346716] [] ? worker_thread+0x63/0x480 [ 960.346720] [] ? rescuer_thread+0x270/0x270 [ 960.346723] [] ? kthread+0xce/0xf0 [ 960.346726] [] ? queue_kthread_work+0x80/0x80 [ 960.346729] [] ? kthread_create_on_node+0x180/0x180 [ 960.346734] [] ? ret_from_fork+0x7c/0xb0 [ 960.346737] [] ? kthread_create_on_node+0x180/0x180 [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 1080.363591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.363594] kworker/u16:0 D 0 6 2 0x [ 1080.363604] Workqueue: netns cleanup_net [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 0296 [ 1080.363614] 7fff 7fff 8182e010 [ 1080.363617] Call Trace: [ 1080.363625] [] ? schedule_timeout+0x18d/0x200 [ 1080.363633] [] ? get_state_synchronize_rcu+0x10/0x10 [ 1080.363637] [] ? wait_for_completion+0xb4/0x110 [ 1080.363641] [] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 1080.363645] [] ? wait_for_completion+0xa9/0x110 [ 1080.363649] [] ? wake_up_state+0x10/0x10 [ 1080.363653] [] ? _rcu_barrier+0x14a/0x1d0 [ 1080.363658] [] ? netdev_run_todo+0x5c/0x310 [ 1080.363663] [] ? rollback_registered_many+0x265/0x2d0 [ 1080.363667] [] ? default_device_exit_batch+0x136/0x150 [ 1080.363672] [] ? __wake_up_sync+0x10/0x10 [ 1080.363676] [] ? cleanup_net+0xf0/0x1d0 [ 1080.363681] [] ? process_one_work+0x136/0x380 [ 1080.363684] [] ? pwq_activate_delayed_work+0x27/0x40 [ 1080.363688] [] ? worker_thread+0x63/0x480 [ 1080.363691] [] ? rescuer_thread+0x270/0x270 [ 1080.363695] [] ? kthread+0xce/0xf0 [ 1080.363698] [] ? queue_kthread_work+0x80/0x80 [ 1080.363702] [] ? kthread_create_on_node+0x180/0x180 [ 1080.363706] [] ? ret_from_fork+0x7c/0xb0 [ 1080.363709] [] ? kthread_create_on_node+0x180/0x180 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.18.0-rc1-00221-gc3351df (mroos@prometheus) (gcc version 4.9.1 (Debian 4.9.1-18) ) #150 SMP Thu Oct 23 14:21:19 EEST 2014 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.0-rc1-00221-gc3351df root=/dev/sda1 ro drm.debug=0xe [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e8000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xdb7ac43f] usable [0.00] BIOS-e820: [mem 0xdb7ac440-0xdb7ae49f] ACPI NVS [0.00] BIOS-e820: [mem 0xdb7ae4a0-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf400-0xf7ff] reserved [0.00] BIOS-e820: [mem 0xfe00-0xfed3] reserved [0.00] BIOS-e820: [mem 0xfed45000-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x000117ff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.6 present. [0.00] DMI: Hewlett-Packard HP Compaq 8100 Elite SFF PC/304Ah, BIOS 786H1 v01.13 07/14/2011 [0.00] e820: update
hung tasks in 3.18.0-rc1-00221-gc3351df
This is first real test on a computer where 3.17 did hang. Fist the hung task info, then full dmesg. [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 960.346618] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 960.346621] kworker/u16:0 D 0 6 2 0x [ 960.346633] Workqueue: netns cleanup_net [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 0296 [ 960.346642] 7fff 7fff 8182e010 [ 960.346646] Call Trace: [ 960.346654] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 960.346660] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 960.346664] [8153c544] ? wait_for_completion+0xb4/0x110 [ 960.346669] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 960.346672] [8153c539] ? wait_for_completion+0xa9/0x110 [ 960.346677] [81069450] ? wake_up_state+0x10/0x10 [ 960.346681] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 960.346686] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 960.346691] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 960.346696] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 960.346701] [8107a240] ? __wake_up_sync+0x10/0x10 [ 960.346704] [81454310] ? cleanup_net+0xf0/0x1d0 [ 960.346709] [8105b566] ? process_one_work+0x136/0x380 [ 960.346713] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 960.346716] [8105ba83] ? worker_thread+0x63/0x480 [ 960.346720] [8105ba20] ? rescuer_thread+0x270/0x270 [ 960.346723] [8105fc9e] ? kthread+0xce/0xf0 [ 960.346726] [8106] ? queue_kthread_work+0x80/0x80 [ 960.346729] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 960.346734] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 960.346737] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 1080.363591] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 1080.363594] kworker/u16:0 D 0 6 2 0x [ 1080.363604] Workqueue: netns cleanup_net [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 0296 [ 1080.363614] 7fff 7fff 8182e010 [ 1080.363617] Call Trace: [ 1080.363625] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 1080.363633] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 1080.363637] [8153c544] ? wait_for_completion+0xb4/0x110 [ 1080.363641] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 1080.363645] [8153c539] ? wait_for_completion+0xa9/0x110 [ 1080.363649] [81069450] ? wake_up_state+0x10/0x10 [ 1080.363653] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 1080.363658] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 1080.363663] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 1080.363667] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 1080.363672] [8107a240] ? __wake_up_sync+0x10/0x10 [ 1080.363676] [81454310] ? cleanup_net+0xf0/0x1d0 [ 1080.363681] [8105b566] ? process_one_work+0x136/0x380 [ 1080.363684] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 1080.363688] [8105ba83] ? worker_thread+0x63/0x480 [ 1080.363691] [8105ba20] ? rescuer_thread+0x270/0x270 [ 1080.363695] [8105fc9e] ? kthread+0xce/0xf0 [ 1080.363698] [8106] ? queue_kthread_work+0x80/0x80 [ 1080.363702] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363706] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 1080.363709] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.18.0-rc1-00221-gc3351df (mroos@prometheus) (gcc version 4.9.1 (Debian 4.9.1-18) ) #150 SMP Thu Oct 23 14:21:19 EEST 2014 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.0-rc1-00221-gc3351df root=/dev/sda1 ro drm.debug=0xe [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e8000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xdb7ac43f] usable [0.00] BIOS-e820:
Re: hung tasks in 3.18.0-rc1-00221-gc3351df
On Thu, Oct 23, 2014 at 04:21:17PM +0300, Meelis Roos wrote: This is first real test on a computer where 3.17 did hang. Fist the hung task info, then full dmesg. [ 960.346611] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 960.346616] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 960.346618] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 960.346621] kworker/u16:0 D 0 6 2 0x [ 960.346633] Workqueue: netns cleanup_net [ 960.346635] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 960.346639] 880111c4d1e0 a000 880111c4d1e0 0296 [ 960.346642] 7fff 7fff 8182e010 [ 960.346646] Call Trace: [ 960.346654] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 960.346660] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 960.346664] [8153c544] ? wait_for_completion+0xb4/0x110 [ 960.346669] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 960.346672] [8153c539] ? wait_for_completion+0xa9/0x110 [ 960.346677] [81069450] ? wake_up_state+0x10/0x10 [ 960.346681] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 960.346686] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 960.346691] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 960.346696] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 960.346701] [8107a240] ? __wake_up_sync+0x10/0x10 [ 960.346704] [81454310] ? cleanup_net+0xf0/0x1d0 [ 960.346709] [8105b566] ? process_one_work+0x136/0x380 [ 960.346713] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 960.346716] [8105ba83] ? worker_thread+0x63/0x480 [ 960.346720] [8105ba20] ? rescuer_thread+0x270/0x270 [ 960.346723] [8105fc9e] ? kthread+0xce/0xf0 [ 960.346726] [8106] ? queue_kthread_work+0x80/0x80 [ 960.346729] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 960.346734] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 960.346737] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363583] INFO: task kworker/u16:0:6 blocked for more than 120 seconds. [ 1080.363589] Tainted: GW 3.18.0-rc1-00221-gc3351df #150 [ 1080.363591] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 1080.363594] kworker/u16:0 D 0 6 2 0x [ 1080.363604] Workqueue: netns cleanup_net [ 1080.363607] 880111c4d1e0 0002 880111c4d1e0 880111c53fd8 [ 1080.363610] 880111c4d1e0 a000 880111c4d1e0 0296 [ 1080.363614] 7fff 7fff 8182e010 [ 1080.363617] Call Trace: [ 1080.363625] [8153dcfd] ? schedule_timeout+0x18d/0x200 [ 1080.363633] [810915a0] ? get_state_synchronize_rcu+0x10/0x10 [ 1080.363637] [8153c544] ? wait_for_completion+0xb4/0x110 [ 1080.363641] [8153e5d5] ? _raw_spin_unlock_irqrestore+0x5/0x10 [ 1080.363645] [8153c539] ? wait_for_completion+0xa9/0x110 [ 1080.363649] [81069450] ? wake_up_state+0x10/0x10 [ 1080.363653] [8109285a] ? _rcu_barrier+0x14a/0x1d0 [ 1080.363658] [8145f13c] ? netdev_run_todo+0x5c/0x310 [ 1080.363663] [81458775] ? rollback_registered_many+0x265/0x2d0 [ 1080.363667] [81459db6] ? default_device_exit_batch+0x136/0x150 [ 1080.363672] [8107a240] ? __wake_up_sync+0x10/0x10 [ 1080.363676] [81454310] ? cleanup_net+0xf0/0x1d0 [ 1080.363681] [8105b566] ? process_one_work+0x136/0x380 [ 1080.363684] [81058ca7] ? pwq_activate_delayed_work+0x27/0x40 [ 1080.363688] [8105ba83] ? worker_thread+0x63/0x480 [ 1080.363691] [8105ba20] ? rescuer_thread+0x270/0x270 [ 1080.363695] [8105fc9e] ? kthread+0xce/0xf0 [ 1080.363698] [8106] ? queue_kthread_work+0x80/0x80 [ 1080.363702] [8105fbd0] ? kthread_create_on_node+0x180/0x180 [ 1080.363706] [8153ed2c] ? ret_from_fork+0x7c/0xb0 [ 1080.363709] [8105fbd0] ? kthread_create_on_node+0x180/0x180 The get_state_synchronize_rcu() from within the wait for _rcu_barrier() looks a bit odd, but perhaps this is stack-tracing inaccuracy. Could you please send your .config? Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/