On Wed, 2009-07-15 at 14:20 -0700, Vasu Dev wrote:
> On Fri, 2009-07-10 at 16:21 -0700, Vasu Dev wrote:
> > On Fri, 2009-07-10 at 15:21 -0700, Joe Eykholt wrote:
> > > This is with the latest fcoe-next.git tree.
> > >
> > > I see this circular dependency message when running my tests. I haven't
> > > completely figured it out, but I think there must be cases where ex lock
> > > is held while em lock is acquired,
> >
> > That is the case of fc_exch_em_alloc() for new exch allocation, the exch
> > lock is held along with em lock while adding new exch to exches array.
> >
> > > and that's a problem.
> >
> > This shouldn't be the issue here since fc_exch_em_alloc() is the only
> > case when these two locks held together, so no possibility of circular
> > locking due to any messed up order of taking these two locks together
> > somewhere else.
> >
> > > It may also be caused by flushing fc_exch_timeout() work while holding the
> > > em lock it sometimes needs.
> > >
> > > I also noticed that fc_exch_timeout is calling fc_exch_rrq while holding
> > > an ex_lock and that does a new fc_exch_alloc() ... which gets the em_lock,
> > > and a different ex_lock so that maybe what it's complaining about.
> > >
> >
> > Yeah this what this log trace is suggesting and I also got same trace
> > with my exch pool patches ( pasted below, that also suggested exact
> > above call flow could lead to circular locking on exch_lock). However I
> > don't see how since fc_exch_rrq will be called with exch lock held but
> > that same exch's lock won't be held again when sending out fc_exch_rrq
> > in that call flow. The exch lock held before fc_exch_rrq and during
> > fc_exch_seq_send() in that flow are on two different exch instances but
> > perhaps checking engine is deducing as if same exch lock is held twice
> > on same exch instance. So perhaps false alarm from lock checking engine
> > or I'm missing something here, but just to avoid this nasty warning,
> > perhaps exch lock should be dropped before calling fc_exch_rrq() but I
> > need to think more about that since that might have some other issues.
> >
>
> Dropping exch lock before calling fc_exch_rrq() seems to be the best
> fix but with one caveat to handle fc_exch_reset case before RRQ retry
> attempted, I handled fc_exch_reset case by checking additional flags in
> case of grabbing exch lock again to schedule RRQ retry.
>
> The fix is:-
>
> diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
> index a074562..c261893 100644
> --- a/drivers/scsi/libfc/fc_exch.c
> +++ b/drivers/scsi/libfc/fc_exch.c
> @@ -445,9 +445,9 @@ static void fc_exch_timeout(struct work_struct *work)
> e_stat = ep->esb_stat;
> if (e_stat & ESB_ST_COMPLETE) {
> ep->esb_stat = e_stat & ~ESB_ST_REC_QUAL;
> + spin_unlock_bh(&ep->ex_lock);
> if (e_stat & ESB_ST_REC_QUAL)
> fc_exch_rrq(ep);
> - spin_unlock_bh(&ep->ex_lock);
> goto done;
> } else {
> resp = ep->resp;
> @@ -1672,14 +1672,14 @@ static void fc_exch_rrq(struct fc_exch *ep)
> struct fc_lport *lp;
> struct fc_els_rrq *rrq;
> struct fc_frame *fp;
> - struct fc_seq *rrq_sp;
> u32 did;
>
> lp = ep->lp;
>
> fp = fc_frame_alloc(lp, sizeof(*rrq));
> if (!fp)
> - return;
> + goto retry;
> +
> rrq = fc_frame_payload_get(fp, sizeof(*rrq));
> memset(rrq, 0, sizeof(*rrq));
> rrq->rrq_cmd = ELS_RRQ;
> @@ -1695,13 +1695,20 @@ static void fc_exch_rrq(struct fc_exch *ep)
> fc_host_port_id(lp->host), FC_TYPE_ELS,
> FC_FC_FIRST_SEQ | FC_FC_END_SEQ | FC_FC_SEQ_INIT, 0);
>
> - rrq_sp = fc_exch_seq_send(lp, fp, fc_exch_rrq_resp, NULL, ep,
> - lp->e_d_tov);
> - if (!rrq_sp) {
> - ep->esb_stat |= ESB_ST_REC_QUAL;
> - fc_exch_timer_set_locked(ep, ep->r_a_tov);
> + if (fc_exch_seq_send(lp, fp, fc_exch_rrq_resp, NULL, ep, lp->e_d_tov))
> + return;
> +
> +retry:
> + spin_lock_bh(&ep->ex_lock);
> + if (ep->state & (FC_EX_RST_CLEANUP | FC_EX_DONE)) {
> + spin_unlock_bh(&ep->ex_lock);
> + /* drop hold for rec qual */
> + fc_exch_release(ep);
> return;
> }
> + ep->esb_stat |= ESB_ST_REC_QUAL;
> + fc_exch_timer_set_locked(ep, ep->r_a_tov);
> + spin_unlock_bh(&ep->ex_lock);
> }
>
>
> I tested this fix by triggering ABTS/RRQ and no more circular lock
> warning but not able to trigger circular locking warning without above
> fix also. I've not changed kernel configuration since I got previously
> sent log of circular locking warning on same kernel, so same kernel
> config with all lock checking enabled. I also tried without the fix
> after reboot but still no more same warning, so I'm not sure how to
> force lock checking code to always check checking circular locking for
> exch loch in fc_exch_rrq call flow, any suggestion ?
>
> Let me know your comments on above fix. I'll provide this fix against
> fcoe-fixes tree once that tree is updated to .31 kernel which is
> currently on 2.6.30-rc4.
>
I've updated fcoe-fixes.git to be based on the current scsi-rc-fixes.git
(2.6.31-rc2).
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel