Re: [Open-FCoE] [PATCH] libfc: fix deadlock bug in fc_exch_abort_locked

Bhanu Prakash Gollapudi Fri, 30 Sep 2011 19:40:59 -0700

On 9/30/2011 5:05 PM, Yi Zou wrote:

With cmmit "7a2b73 [SCSI] libfc: fix fc_eh_host_reset", the exch lock is already
held when fc_exch_abort_locked(), but fc_seq_send() that is called along the
path is gonna grab the same exch lock again, causing dead-lock. Drop the exch
lock before fc_seq_send() in fc_exch_abort_locked() should fix it.


[ Bhanu, I think the fix is good and I did some create/destroy w/ io tests, and
seems to be fine. Can you take a look to see if I have missed anything here. I
will get more extensive testing later.

thanks - yi ]

Yi, thanks for fixing this. The fix looks good to me, and it shouldavoid the deadlock. Since the ex_lock is only protecting esb_stat, it isokay to release it and re-acquire it after fc_seq_send, although I thinkit may be a bit overkill to hold this lock across fc_frame_alloc.


I'll test it too.

Thanks,
Bhanu


This is reported by Bhanu with the following kernel log:
https://lists.open-fcoe.org/pipermail/devel/2011-September/011767.html

626 Sep 30 12:24:36 localhost kernel: [ INFO: possible recursive locking 
detected ]
627 Sep 30 12:24:36 localhost kernel: 3.1.0-rc7+ #5
628 Sep 30 12:24:36 localhost kernel:
---------------------------------------------
629 Sep 30 12:24:36 localhost kernel: kworker/2:1/49 is trying to acquire lock:
630 Sep 30 12:24:36 localhost kernel: (&(&ep->ex_lock)->rlock){+.....}, at: 
[<ffffffffa0366910>] fc_seq_send+0x100/0x160 [libfc]
631 Sep 30 12:24:36 localhost kernel:
632 Sep 30 12:24:36 localhost kernel: but task is already holding lock:
633 Sep 30 12:24:36 localhost kernel: (&(&ep->ex_lock)->rlock){+.....}, at: 
[<ffffffffa0366bc0>] fc_exch_reset+0x30/0x100 [libfc]
634 Sep 30 12:24:36 localhost kernel:
635 Sep 30 12:24:36 localhost kernel: other info that might help us debug this:
636 Sep 30 12:24:36 localhost kernel: Possible unsafe locking scenario:
637 Sep 30 12:24:36 localhost kernel:
638 Sep 30 12:24:36 localhost kernel:       CPU0
639 Sep 30 12:24:36 localhost kernel:       ----
640 Sep 30 12:24:36 localhost kernel:  lock(&(&ep->ex_lock)->rlock);
641 Sep 30 12:24:36 localhost kernel:  lock(&(&ep->ex_lock)->rlock);
642 Sep 30 12:24:36 localhost kernel:
643 Sep 30 12:24:36 localhost kernel: *** DEADLOCK ***
644 Sep 30 12:24:36 localhost kernel:
645 Sep 30 12:24:36 localhost kernel: May be due to missing lock nesting 
notation
646 Sep 30 12:24:36 localhost kernel:
647 Sep 30 12:24:36 localhost kernel: 4 locks held by kworker/2:1/49:
648 Sep 30 12:24:36 localhost kernel: #0:  (fcoe){.+.+..}, at: 
[<ffffffff810890ef>] process_one_work+0x13f/0x500
649 Sep 30 12:24:36 localhost kernel: #1: ((&port->destroy_work)){+.+...}, at: 
[<ffffffff810890ef>] process_one_work+0x13f/0x500
650 Sep 30 12:24:36 localhost kernel: #2:  (fcoe_config_mutex){+.+.+.}, at: 
[<ffffffffa02af317>] fcoe_destroy_work+0x27/0x70 [fcoe]
651 Sep 30 12:24:36 localhost kernel: #3: (&(&ep->ex_lock)->rlock){+.....}, at: 
[<ffffffffa0366bc0>] fc_exch_reset+0x30/0x100 [libfc]
652 Sep 30 12:24:36 localhost kernel:
653 Sep 30 12:24:36 localhost kernel: stack backtrace:
654 Sep 30 12:24:36 localhost kernel: Pid: 49, comm: kworker/2:1 Not tainted 
3.1.0-rc7+ #5
655 Sep 30 12:24:36 localhost kernel: Call Trace:
656 Sep 30 12:24:36 localhost kernel: [<ffffffff810aa105>] 
print_deadlock_bug+0xe5/0xf0
657 Sep 30 12:24:36 localhost kernel: [<ffffffff810abd07>] 
validate_chain+0x547/0x7d0
658 Sep 30 12:24:36 localhost kernel: [<ffffffff8101a439>] ? 
sched_clock+0x9/0x10
659 Sep 30 12:24:36 localhost kernel: [<ffffffff810ac294>] 
__lock_acquire+0x304/0x500
660 Sep 30 12:24:36 localhost kernel: [<ffffffff811695da>] ? 
kmem_cache_free+0x11a/0x2b0
661 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>] ? 
fc_seq_send+0x100/0x160 [libfc]
662 Sep 30 12:24:36 localhost kernel: [<ffffffff810acb42>] 
lock_acquire+0xa2/0x120
663 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>] ? 
fc_seq_send+0x100/0x160 [libfc]
664 Sep 30 12:24:36 localhost kernel: [<ffffffff81523dab>] 
_raw_spin_lock_bh+0x3b/0x70
665 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>] ? 
fc_seq_send+0x100/0x160 [libfc]
666 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366910>] 
fc_seq_send+0x100/0x160 [libfc]
667 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366aa6>] 
fc_exch_abort_locked+0x136/0x1c0 [libfc]
668 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366bca>] 
fc_exch_reset+0x3a/0x100 [libfc]
669 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366d44>] 
fc_exch_pool_reset+0xb4/0xf0 [libfc]
670 Sep 30 12:24:36 localhost kernel: [<ffffffffa0366df3>] 
fc_exch_mgr_reset+0x73/0xb0 [libfc]
671 Sep 30 12:24:36 localhost kernel: [<ffffffffa036afe3>] 
fc_lport_destroy+0x63/0x80 [libfc]
672 Sep 30 12:24:36 localhost kernel: [<ffffffffa02ae5e2>] 
fcoe_if_destroy+0x52/0x130 [fcoe]
673 Sep 30 12:24:36 localhost kernel: [<ffffffffa02af331>] 
fcoe_destroy_work+0x41/0x70 [fcoe]
674 Sep 30 12:24:36 localhost kernel: [<ffffffff8108915e>] 
process_one_work+0x1ae/0x500
675 Sep 30 12:24:36 localhost kernel: [<ffffffff810890ef>] ? 
process_one_work+0x13f/0x500
676 Sep 30 12:24:36 localhost kernel: [<ffffffffa02af2f0>] ? 
fcoe_interface_cleanup+0x160/0x160 [fcoe]

Reported-by: Bhanu Prakash Gollapudi<[email protected]>
Signed-off-by: Yi Zou<[email protected]>
---

  drivers/scsi/libfc/fc_exch.c |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
index 7c055fd..87c53c8 100644
--- a/drivers/scsi/libfc/fc_exch.c
+++ b/drivers/scsi/libfc/fc_exch.c
@@ -621,7 +621,9 @@ static int fc_exch_abort_locked(struct fc_exch *ep,
        if (fp) {
                fc_fill_fc_hdr(fp, FC_RCTL_BA_ABTS, ep->did, ep->sid,
                               FC_TYPE_BLS, FC_FC_END_SEQ | FC_FC_SEQ_INIT, 0);
+               spin_unlock_bh(&ep->ex_lock);
                error = fc_seq_send(ep->lp, sp, fp);
+               spin_lock_bh(&ep->ex_lock);
        } else
                error = -ENOBUFS;
        return error;

_______________________________________________
devel mailing list
[email protected]
https://lists.open-fcoe.org/mailman/listinfo/devel



_______________________________________________
devel mailing list
[email protected]
https://lists.open-fcoe.org/mailman/listinfo/devel

Re: [Open-FCoE] [PATCH] libfc: fix deadlock bug in fc_exch_abort_locked

Reply via email to