On 9/30/2011 5:05 PM, Yi Zou wrote:
With cmmit "7a2b73 [SCSI] libfc: fix fc_eh_host_reset", the exch lock is already
held when fc_exch_abort_locked(), but fc_seq_send() that is called along the
path is gonna grab the same exch lock again, causing dead-lock. Drop the exch
lock before fc_seq_send() in fc_exch_abort_locked() should fix it.
[ Bhanu, I think the fix is good and I did some create/destroy w/ io tests, and
seems to be fine. Can you take a look to see if I have missed anything here. I
will get more extensive testing later.
thanks - yi ]
Yi, thanks for fixing this. The fix looks good to me, and it should
avoid the deadlock. Since the ex_lock is only protecting esb_stat, it is
okay to release it and re-acquire it after fc_seq_send, although I think
it may be a bit overkill to hold this lock across fc_frame_alloc.
I'll test it too.
Thanks,
Bhanu
This is reported by Bhanu with the following kernel log:
https://lists.open-fcoe.org/pipermail/devel/2011-September/011767.html
626 Sep 30 12:24:36 localhost kernel: [ INFO: possible recursive locking
detected ]
627 Sep 30 12:24:36 localhost kernel: 3.1.0-rc7+ #5
628 Sep 30 12:24:36 localhost kernel:
---------------------------------------------
629 Sep 30 12:24:36 localhost kernel: kworker/2:1/49 is trying to acquire lock:
630 Sep 30 12:24:36 localhost kernel: (&(&ep->ex_lock)->rlock){+.....}, at:
[<ffffffffa0366910>] fc_seq_send+0x100/0x160 [libfc]
631 Sep 30 12:24:36 localhost kernel:
632 Sep 30 12:24:36 localhost kernel: but task is already holding lock:
633 Sep 30 12:24:36 localhost kernel: (&(&ep->ex_lock)->rlock){+.....}, at:
[<ffffffffa0366bc0>] fc_exch_reset+0x30/0x100 [libfc]
634 Sep 30 12:24:36 localhost kernel:
635 Sep 30 12:24:36 localhost kernel: other info that might help us debug this:
636 Sep 30 12:24:36 localhost kernel: Possible unsafe locking scenario:
637 Sep 30 12:24:36 localhost kernel:
638 Sep 30 12:24:36 localhost kernel: CPU0
639 Sep 30 12:24:36 localhost kernel: ----
640 Sep 30 12:24:36 localhost kernel: lock(&(&ep->ex_lock)->rlock);
641 Sep 30 12:24:36 localhost kernel: lock(&(&ep->ex_lock)->rlock);
642 Sep 30 12:24:36 localhost kernel:
643 Sep 30 12:24:36 localhost kernel: *** DEADLOCK ***
644 Sep 30 12:24:36 localhost kernel:
645 Sep 30 12:24:36 localhost kernel: May be due to missing lock nesting
notation
646 Sep 30 12:24:36 localhost kernel:
647 Sep 30 12:24:36 localhost kernel: 4 locks held by kworker/2:1/49:
648 Sep 30 12:24:36 localhost kernel: #0: (fcoe){.+.+..}, at:
[<ffffffff810890ef>] process_one_work+0x13f/0x500
649 Sep 30 12:24:36 localhost kernel: #1: ((&port->destroy_work)){+.+...}, at:
[<ffffffff810890ef>] process_one_work+0x13f/0x500
650 Sep 30 12:24:36 localhost kernel: #2: (fcoe_config_mutex){+.+.+.}, at:
[<ffffffffa02af317>] fcoe_destroy_work+0x27/0x70 [fcoe]
651 Sep 30 12:24:36 localhost kernel: #3: (&(&ep->ex_lock)->rlock){+.....}, at:
[<ffffffffa0366bc0>] fc_exch_reset+0x30/0x100 [libfc]
652 Sep 30 12:24:36 localhost kernel:
653 Sep 30 12:24:36 localhost kernel: stack backtrace:
654 Sep 30 12:24:36 localhost kernel: Pid: 49, comm: kworker/2:1 Not tainted
3.1.0-rc7+ #5
655 Sep 30 12:24:36 localhost kernel: Call Trace:
656 Sep 30 12:24:36 localhost kernel: [<ffffffff810aa105>]
print_deadlock_bug+0xe5/0xf0
657 Sep 30 12:24:36 localhost kernel: [<ffffffff810abd07>]
validate_chain+0x547/0x7d0
658 Sep 30 12:24:36 localhost kernel: [<ffffffff8101a439>] ?
sched_clock+0x9/0x10
659 Sep 30 12:24:36 localhost kernel: [<ffffffff810ac294>]
__lock_acquire+0x304/0x500
660 Sep 30 12:24:36 localhost kernel: [<ffffffff811695da>] ?
kmem_cache_free+0x11a/0x2b0
661 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>] ?
fc_seq_send+0x100/0x160 [libfc]
662 Sep 30 12:24:36 localhost kernel: [<ffffffff810acb42>]
lock_acquire+0xa2/0x120
663 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>] ?
fc_seq_send+0x100/0x160 [libfc]
664 Sep 30 12:24:36 localhost kernel: [<ffffffff81523dab>]
_raw_spin_lock_bh+0x3b/0x70
665 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>] ?
fc_seq_send+0x100/0x160 [libfc]
666 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>]
fc_seq_send+0x100/0x160 [libfc]
667 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366aa6>]
fc_exch_abort_locked+0x136/0x1c0 [libfc]
668 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366bca>]
fc_exch_reset+0x3a/0x100 [libfc]
669 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366d44>]
fc_exch_pool_reset+0xb4/0xf0 [libfc]
670 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366df3>]
fc_exch_mgr_reset+0x73/0xb0 [libfc]
671 Sep 30 12:24:36 localhost kernel: [<ffffffffa036afe3>]
fc_lport_destroy+0x63/0x80 [libfc]
672 Sep 30 12:24:36 localhost kernel: [<ffffffffa02ae5e2>]
fcoe_if_destroy+0x52/0x130 [fcoe]
673 Sep 30 12:24:36 localhost kernel: [<ffffffffa02af331>]
fcoe_destroy_work+0x41/0x70 [fcoe]
674 Sep 30 12:24:36 localhost kernel: [<ffffffff8108915e>]
process_one_work+0x1ae/0x500
675 Sep 30 12:24:36 localhost kernel: [<ffffffff810890ef>] ?
process_one_work+0x13f/0x500
676 Sep 30 12:24:36 localhost kernel: [<ffffffffa02af2f0>] ?
fcoe_interface_cleanup+0x160/0x160 [fcoe]
Reported-by: Bhanu Prakash Gollapudi<[email protected]>
Signed-off-by: Yi Zou<[email protected]>
---
drivers/scsi/libfc/fc_exch.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
index 7c055fd..87c53c8 100644
--- a/drivers/scsi/libfc/fc_exch.c
+++ b/drivers/scsi/libfc/fc_exch.c
@@ -621,7 +621,9 @@ static int fc_exch_abort_locked(struct fc_exch *ep,
if (fp) {
fc_fill_fc_hdr(fp, FC_RCTL_BA_ABTS, ep->did, ep->sid,
FC_TYPE_BLS, FC_FC_END_SEQ | FC_FC_SEQ_INIT, 0);
+ spin_unlock_bh(&ep->ex_lock);
error = fc_seq_send(ep->lp, sp, fp);
+ spin_lock_bh(&ep->ex_lock);
} else
error = -ENOBUFS;
return error;
_______________________________________________
devel mailing list
[email protected]
https://lists.open-fcoe.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
[email protected]
https://lists.open-fcoe.org/mailman/listinfo/devel