> On 8/9/2011 2:20 PM, Vasu Dev wrote:
> > Current fc_eh_host_reset leaves lport offline
> > permanently due to FLOGI response getting
> > handled by LOGO response from last reset as both
> > had same exchange id.
> >
> > So fix this by having end to end exches clean-up
> > using exchange abort along exches reset
> > done from fc_eh_host_reset. This would avoid
> > exchanges collision between the sessions across
> > the reset. In this case implicit login should have
> > done that but no aborting support for FIP
> > frames, so just wait till lport->r_a_tov before
> > restarting next flogi to ensure all exchanges
> > are good to use again for next session.
> >
> > Below is the trace of LOGO from older session
> > coming ahead of FLOGI response with same exche id
> > 0x203:-
> >
> > 617 86.435165 4e.00.0b -> ff.ff.fc FC ELS LOGO 0x203
> > 618 86.435195 4e.00.0b -> b6.02.00 FC ELS LOGO 0x213
> > 619 86.435220 4e.00.0b -> 18.03.00 FC ELS LOGO 0x223
> > 620 86.435244 4e.00.0b -> 18.02.00 FC ELS LOGO 0x233
> > 621 86.435267 4e.00.0b -> 18.01.00 FC ELS LOGO 0x243
> > 622 86.435349 00.00.00 -> ff.ff.fe FC ELS FLOGI 0x203
> > 623 86.435549 ff.ff.fc -> 4e.00.0b FC ELS ACC (LOGO) 0x203
> > 624 86.438721 ff.ff.fe -> 4e.00.0b FC ELS ACC (FLOGI) 0x203
> > 625 86.442059 18.03.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x223
> > 626 86.443683 b6.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x213
> > 627 86.447693 18.01.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x243
> > 628 86.453499 18.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x233
> >
> > Signed-off-by: Vasu Dev<[email protected]>
> > Tested-by: Ross Brattain<[email protected]>
> > ---
> I'm seeing a couple side effects with this change
>
> 1. r_a_tov delay is present not only when fc_eh_host_reset is called,
> but every time the lport is reset, as fc_lport_enter_reset() has the
> delay. fc_lport_enter_reset is called from multiple contexts, and some
> of them have rtnl_lock() held. Because of this change we now have a
> delay of 10 secs whenever an FCoE interface is created.
>
> 2. When the FCoE interface is destroyed, we now send a LOGO followed
> immediately by ABTS. Although there is no functional problem, it doesn't
> look right from protocol point-of-view, as the exchange didn't really
> timeout to issue an ABTS.
>
> Can you please elaborate in what circumstances do we see this problem
> (apart from what is described)? Is the delay of 10 secs required in all
> the scenarios? I'm afraid it may lead to system slowdown as we sleep
> with multiple locks held. One such issue is observed when creating
> multiple NPIV ports. We also see the system temporarily hangs for 10
> secs during the msleep() when the FCoE interface is created.
>
> If this is meant only for fc_eh_host_reset(), can we somehow localize
> this fix only to that context?
>
> Thanks,
> Bhanu
I think the problem the patch is trying to fix is already illustraced
from the included trace. Basically it's a conflict of exch id due to the
fact that initiator side thinks the exch resources are free since the
fc_eh_host_reset() would result exch_mgr_reset, so I think any path that
ends fc_lport_reset() will likely to have this issue, fc_eh_host_reset()
or the other one I see is disabling fc_lport.
I can see the side effects you described here, particularly that it is
not nice to msleep() w/ multiple locks held. Since currently there is
no exch timeout value (not sure why?) for LOGO on rports, i.e., LOGOs
shown in this trace, and as Vasu mentioned no ABTS on FIP, sending about
along the exch reset path is a solution to guarantee the exch on both
ends are in synced up state.
Bottom line is that we have to know the exch is reusable to the other side
before reissuing the flogi, I guess we really need timeout on rport LOGOs,
not just fabric one, and we have to wait for completion of LOGO ACC/RJT
or the timeout before continuing.
yi
>
>
> >
> > drivers/scsi/libfc/fc_exch.c | 51 ++++++++++++++++++++++++++++----
> ---------
> > drivers/scsi/libfc/fc_lport.c | 11 ++++++++-
> > 2 files changed, 45 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/scsi/libfc/fc_exch.c
> b/drivers/scsi/libfc/fc_exch.c
> > index 01ff082..744fefe 100644
> > --- a/drivers/scsi/libfc/fc_exch.c
> > +++ b/drivers/scsi/libfc/fc_exch.c
> > @@ -494,6 +494,9 @@ static int fc_seq_send(struct fc_lport *lport,
> struct fc_seq *sp,
> > */
> > error = lport->tt.frame_send(lport, fp);
> >
> > + if (fh->fh_type == FC_TYPE_BLS)
> > + return error;
> > +
> > /*
> > * Update the exchange and sequence flags,
> > * assuming all frames for the sequence have been sent.
> > @@ -575,42 +578,35 @@ static void fc_seq_set_resp(struct fc_seq *sp,
> > }
> >
> > /**
> > - * fc_seq_exch_abort() - Abort an exchange and sequence
> > - * @req_sp: The sequence to be aborted
> > + * fc_exch_abort_locked() - Abort an exchange
> > + * @ep: The exchange to be aborted
> > * @timer_msec: The period of time to wait before aborting
> > *
> > - * Generally called because of a timeout or an abort from the upper
> layer.
> > + * Locking notes: Called with exch lock held
> > + *
> > + * Return value: 0 on success else error code
> > */
> > -static int fc_seq_exch_abort(const struct fc_seq *req_sp,
> > - unsigned int timer_msec)
> > +static int fc_exch_abort_locked(struct fc_exch *ep,
> > + unsigned int timer_msec)
> > {
> > struct fc_seq *sp;
> > - struct fc_exch *ep;
> > struct fc_frame *fp;
> > int error;
> >
> > - ep = fc_seq_exch(req_sp);
> > -
> > - spin_lock_bh(&ep->ex_lock);
> > if (ep->esb_stat& (ESB_ST_COMPLETE | ESB_ST_ABNORMAL) ||
> > - ep->state& (FC_EX_DONE | FC_EX_RST_CLEANUP)) {
> > - spin_unlock_bh(&ep->ex_lock);
> > + ep->state& (FC_EX_DONE | FC_EX_RST_CLEANUP))
> > return -ENXIO;
> > - }
> >
> > /*
> > * Send the abort on a new sequence if possible.
> > */
> > sp = fc_seq_start_next_locked(&ep->seq);
> > - if (!sp) {
> > - spin_unlock_bh(&ep->ex_lock);
> > + if (!sp)
> > return -ENOMEM;
> > - }
> >
> > ep->esb_stat |= ESB_ST_SEQ_INIT | ESB_ST_ABNORMAL;
> > if (timer_msec)
> > fc_exch_timer_set_locked(ep, timer_msec);
> > - spin_unlock_bh(&ep->ex_lock);
> >
> > /*
> > * If not logged into the fabric, don't send ABTS but leave
> > @@ -633,6 +629,28 @@ static int fc_seq_exch_abort(const struct fc_seq
> *req_sp,
> > }
> >
> > /**
> > + * fc_seq_exch_abort() - Abort an exchange and sequence
> > + * @req_sp: The sequence to be aborted
> > + * @timer_msec: The period of time to wait before aborting
> > + *
> > + * Generally called because of a timeout or an abort from the upper
> layer.
> > + *
> > + * Return value: 0 on success else error code
> > + */
> > +static int fc_seq_exch_abort(const struct fc_seq *req_sp,
> > + unsigned int timer_msec)
> > +{
> > + struct fc_exch *ep;
> > + int error;
> > +
> > + ep = fc_seq_exch(req_sp);
> > + spin_lock_bh(&ep->ex_lock);
> > + error = fc_exch_abort_locked(ep, timer_msec);
> > + spin_unlock_bh(&ep->ex_lock);
> > + return error;
> > +}
> > +
> > +/**
> > * fc_exch_timeout() - Handle exchange timer expiration
> > * @work: The work_struct identifying the exchange that timed out
> > */
> > @@ -1715,6 +1733,7 @@ static void fc_exch_reset(struct fc_exch *ep)
> > int rc = 1;
> >
> > spin_lock_bh(&ep->ex_lock);
> > + fc_exch_abort_locked(ep, 0);
> > ep->state |= FC_EX_RST_CLEANUP;
> > if (cancel_delayed_work(&ep->timeout_work))
> > atomic_dec(&ep->ex_refcnt); /* drop hold for timer */
> > diff --git a/drivers/scsi/libfc/fc_lport.c
> b/drivers/scsi/libfc/fc_lport.c
> > index e55ed9c..628f347 100644
> > --- a/drivers/scsi/libfc/fc_lport.c
> > +++ b/drivers/scsi/libfc/fc_lport.c
> > @@ -88,6 +88,7 @@
> > */
> >
> > #include<linux/timer.h>
> > +#include<linux/delay.h>
> > #include<linux/slab.h>
> > #include<asm/unaligned.h>
> >
> > @@ -1029,8 +1030,16 @@ static void fc_lport_enter_reset(struct fc_lport
> *lport)
> > FCH_EVT_LIPRESET, 0);
> > fc_vports_linkchange(lport);
> > fc_lport_reset_locked(lport);
> > - if (lport->link_up)
> > + if (lport->link_up) {
> > + /*
> > + * Wait upto resource allocation time out before
> > + * doing re-login since incomplete FIP exchanged
> > + * from last session may collide with exchanges
> > + * in new session.
> > + */
> > + msleep(lport->r_a_tov);
> > fc_lport_enter_flogi(lport);
> > + }
> > }
> >
> > /**
> >
> > _______________________________________________
> > devel mailing list
> > [email protected]
> > https://lists.open-fcoe.org/mailman/listinfo/devel
> >
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> https://lists.open-fcoe.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
[email protected]
https://lists.open-fcoe.org/mailman/listinfo/devel