On Wed, 2017-11-01 at 08:21 -0600, Jens Axboe wrote:
> Fixed that up, and applied these two patches as well.
Hello Jens,
Recently I noticed that a test system sporadically hangs during boot (Dell
PowerEdge R720 that boots from a hard disk connected to a MegaRAID SAS adapter)
and also that srp-tests systematically hangs. Reverting the two patches from
this series fixes both issues. I'm not sure there is another solution than
reverting the two patches from this series.
Bart.
BTW, the following appeared in the kernel log when I tried to run srp-tests
against a kernel with the two patches from this series applied:
INFO: task kworker/19:1:209 blocked for more than 480 seconds.
INFO: task kworker/19:1:209 blocked for more than 480 seconds.
Tainted: G W 4.14.0-rc7-dbg+ #1
Tainted: G W 4.14.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/19:1 D 0 209 2 0x80000000
kworker/19:1 D 0 209 2 0x80000000
Workqueue: srp_remove srp_remove_work [ib_srp]
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
Call Trace:
__schedule+0x2fa/0xbb0
__schedule+0x2fa/0xbb0
schedule+0x36/0x90
schedule+0x36/0x90
async_synchronize_cookie_domain+0x88/0x130
async_synchronize_cookie_domain+0x88/0x130
? finish_wait+0x90/0x90
? finish_wait+0x90/0x90
async_synchronize_full_domain+0x18/0x20
async_synchronize_full_domain+0x18/0x20
sd_remove+0x4d/0xc0 [sd_mod]
sd_remove+0x4d/0xc0 [sd_mod]
device_release_driver_internal+0x160/0x210
device_release_driver_internal+0x160/0x210
device_release_driver+0x12/0x20
device_release_driver+0x12/0x20
bus_remove_device+0x100/0x180
bus_remove_device+0x100/0x180
device_del+0x1d8/0x340
device_del+0x1d8/0x340
__scsi_remove_device+0xfc/0x130
__scsi_remove_device+0xfc/0x130
scsi_forget_host+0x25/0x70
scsi_forget_host+0x25/0x70
scsi_remove_host+0x79/0x120
scsi_remove_host+0x79/0x120
srp_remove_work+0x90/0x1d0 [ib_srp]
srp_remove_work+0x90/0x1d0 [ib_srp]
process_one_work+0x20a/0x660
process_one_work+0x20a/0x660
worker_thread+0x3d/0x3b0
worker_thread+0x3d/0x3b0
kthread+0x13a/0x150
kthread+0x13a/0x150
? process_one_work+0x660/0x660
? process_one_work+0x660/0x660
? kthread_create_on_node+0x40/0x40
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x27/0x40
ret_from_fork+0x27/0x40
Showing all locks held in the system:
Showing all locks held in the system:
1 lock held by khungtaskd/170:
1 lock held by khungtaskd/170:
#0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>]
debug_show_all_locks+0x3d/0x1a0
#0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>]
debug_show_all_locks+0x3d/0x1a0
4 locks held by kworker/19:1/209:
4 locks held by kworker/19:1/209:
#0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>]
scsi_remove_host+0x1f/0x120
#2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>]
scsi_remove_host+0x1f/0x120
#3: (&dev->mutex){....}, at: [<ffffffff814501a9>]
device_release_driver_internal+0x39/0x210
#3: (&dev->mutex){....}, at: [<ffffffff814501a9>]
device_release_driver_internal+0x39/0x210
2 locks held by kworker/u66:0/1927:
2 locks held by kworker/u66:0/1927:
#0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
2 locks held by kworker/5:0/2047:
2 locks held by kworker/5:0/2047:
#0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
#0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
#1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
=============================================
=============================================
INFO: task kworker/19:1:209 blocked for more than 480 seconds.
INFO: task kworker/19:1:209 blocked for more than 480 seconds.
Tainted: G W 4.14.0-rc7-dbg+ #1
Tainted: G W 4.14.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/19:1 D 0 209 2 0x80000000
kworker/19:1 D 0 209 2 0x80000000
Workqueue: srp_remove srp_remove_work [ib_srp]
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
Call Trace:
__schedule+0x2fa/0xbb0
__schedule+0x2fa/0xbb0
schedule+0x36/0x90
schedule+0x36/0x90
async_synchronize_cookie_domain+0x88/0x130
async_synchronize_cookie_domain+0x88/0x130
? finish_wait+0x90/0x90
? finish_wait+0x90/0x90
async_synchronize_full_domain+0x18/0x20
async_synchronize_full_domain+0x18/0x20
sd_remove+0x4d/0xc0 [sd_mod]
sd_remove+0x4d/0xc0 [sd_mod]
device_release_driver_internal+0x160/0x210
device_release_driver_internal+0x160/0x210
device_release_driver+0x12/0x20
device_release_driver+0x12/0x20
bus_remove_device+0x100/0x180
bus_remove_device+0x100/0x180
device_del+0x1d8/0x340
device_del+0x1d8/0x340
__scsi_remove_device+0xfc/0x130
__scsi_remove_device+0xfc/0x130
scsi_forget_host+0x25/0x70
scsi_forget_host+0x25/0x70
scsi_remove_host+0x79/0x120
scsi_remove_host+0x79/0x120
srp_remove_work+0x90/0x1d0 [ib_srp]
srp_remove_work+0x90/0x1d0 [ib_srp]
process_one_work+0x20a/0x660
process_one_work+0x20a/0x660
worker_thread+0x3d/0x3b0
worker_thread+0x3d/0x3b0
kthread+0x13a/0x150
kthread+0x13a/0x150
? process_one_work+0x660/0x660
? process_one_work+0x660/0x660
? kthread_create_on_node+0x40/0x40
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x27/0x40
ret_from_fork+0x27/0x40
Showing all locks held in the system:
Showing all locks held in the system:
1 lock held by khungtaskd/170:
1 lock held by khungtaskd/170:
#0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>]
debug_show_all_locks+0x3d/0x1a0
#0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>]
debug_show_all_locks+0x3d/0x1a0
4 locks held by kworker/19:1/209:
4 locks held by kworker/19:1/209:
#0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>]
scsi_remove_host+0x1f/0x120
#2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>]
scsi_remove_host+0x1f/0x120
#3: (&dev->mutex){....}, at: [<ffffffff814501a9>]
device_release_driver_internal+0x39/0x210
#3: (&dev->mutex){....}, at: [<ffffffff814501a9>]
device_release_driver_internal+0x39/0x210
2 locks held by kworker/u66:0/1927:
2 locks held by kworker/u66:0/1927:
#0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
2 locks held by kworker/5:0/2047:
2 locks held by kworker/5:0/2047:
#0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
#0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
#1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
#1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>]
process_one_work+0x195/0x660
=============================================
=============================================