On Tue, 2010-05-25 at 09:22 -0700, Joe Eykholt wrote:
> Joe Eykholt wrote:
> > This patch somewhat combines two fixes to remote port handing in libfc.
> > 
> > The first problem was that rport work could be queued on a deleted
> > and freed rport.  This is handled by not resetting rdata->event
> > ton NONE if the rdata is about to be deleted.
> > 
> > However, that fix led to the second problem, described by
> > Bhanu Gollapudi, as follows:
> >> Here is the sequence of events. T1 is first LOGO receive thread, T2 is
> >> fc_rport_work() scheduled by T1 and T3 is second LOGO receive thread and
> >> T4 is fc_rport_work scheduled by T3.
> >>
> >> 1. (T1)Received 1st LOGO in state Ready
> >> 2. (T1)Delete port & enter to RESTART state.
> >> 3. (T1)schdule event_work, since event is RPORT_EV_NONE.
> >> 4. (T1)set event = RPORT_EV_LOGO
> >> 5. (T1)Enter RESTART state as disc_id is set.
> >> 6. (T2)remember to PLOGI, and set event = RPORT_EV_NONE
> >> 6. (T3)Received 2nd LOGO
> >> 7. (T3)Delete Port & enter to RESTART state.
> >> 8. (T3)schedule event_work, since event is RPORT_EV_NONE.
> >> 9. (T3)Enter RESTART state as disc_id is set.
> >> 9. (T3)set event = RPORT_EV_LOGO
> >> 10.(T2)work restart, enter PLOGI state and issues PLOGI
> >> 11.(T4)Since state is not RESTART anymore, restart is not set, and the
> >> event is not reset to RPORT_EV_NONE. (current event is RPORT_EV_LOGO).
> >> 12. Now, PLOGI succeeds and fc_rport_enter_ready() will not schedule
> >> event_work, and hence the rport will never be created, eventually losing
> >> the target after dev_loss_tmo.
> > 
> > So, the problem here is that we were tracking the desire for
> > the rport be restarted by state RESTART, which was otherwise
> > equivalent to DELETE.  A contributing factor is that we dropped
> > the lock between steps 6 and 10 in thread T2, which allows the
> > state to change, and we didn't completely re-evaluate then.
> > 
> > This is hopefully corrected by the following minor redesign:
> > 
> > Simplify the rport restart logic by making the decision to
> > restart after deleting the transport rport.  That decision
> > is based on a new STARTED flag that indicates fc_rport_login()
> > has been called and fc_rport_logoff() has not been called
> > since then.  This replaces the need for the RESTART state.
> > 
> > Only restart if the rdata is still in DELETED state
> > and only if it still has the STARTED flag set.
> > 
> > Also now, since we clear the event code much later in the
> > work thread, allow for the possibility that the rport may
> > have become READY again via incoming PLOGI, and if so,
> > queue another event to handle that.
> > 
> > In the problem scenario, the second LOGO received will
> > cause the LOGO event to occur again.
> 
> We've discovered a problem with this patch.  If a PRLI or
> other request in the state machine fails we could restart
> indefinitely.
> 
> I'm not sure how to handle this yet.  Perhaps instead of
> going thru the work item and delete at that point we should
> just enter an error state until login is called again,
> either by discovery or a link flap.   Suggestions welcome.

Joe, correct me if I'm wrong, but shouldn't we handle it 
similar to PLOGI failure where we clear the FC_RP_STARTED
flag and then LOGO from that rport.

> 
> The other three patches in this series are unrelated and
> could go ahead.
> 
>       Joe
> 
> > Reported-by: Bhanu Gollapudi <[email protected]>
> > Signed-off-by: Joe Eykholt <[email protected]>
> > ---
> >  drivers/scsi/libfc/fc_rport.c |   75 
> > +++++++++++++++++------------------------
> >  include/scsi/libfc.h          |    5 +--
> >  2 files changed, 33 insertions(+), 47 deletions(-)
> > 
> > 
> > diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
> > index dae67ed..96b8df5 100644
> > --- a/drivers/scsi/libfc/fc_rport.c
> > +++ b/drivers/scsi/libfc/fc_rport.c
> > @@ -89,7 +89,6 @@ static const char *fc_rport_state_names[] = {
> >     [RPORT_ST_LOGO] = "LOGO",
> >     [RPORT_ST_ADISC] = "ADISC",
> >     [RPORT_ST_DELETE] = "Delete",
> > -   [RPORT_ST_RESTART] = "Restart",
> >  };
> >  
> >  /**
> > @@ -246,7 +245,6 @@ static void fc_rport_work(struct work_struct *work)
> >     struct fc_rport_operations *rport_ops;
> >     struct fc_rport_identifiers ids;
> >     struct fc_rport *rport;
> > -   int restart = 0;
> >  
> >     mutex_lock(&rdata->rp_mutex);
> >     event = rdata->event;
> > @@ -298,24 +296,6 @@ static void fc_rport_work(struct work_struct *work)
> >             port_id = rdata->ids.port_id;
> >             mutex_unlock(&rdata->rp_mutex);
> >  
> > -           if (port_id != FC_FID_DIR_SERV) {
> > -                   /*
> > -                    * We must drop rp_mutex before taking disc_mutex.
> > -                    * Re-evaluate state to allow for restart.
> > -                    * A transition to RESTART state must only happen
> > -                    * while disc_mutex is held and rdata is on the list.
> > -                    */
> > -                   mutex_lock(&lport->disc.disc_mutex);
> > -                   mutex_lock(&rdata->rp_mutex);
> > -                   if (rdata->rp_state == RPORT_ST_RESTART)
> > -                           restart = 1;
> > -                   else
> > -                           list_del(&rdata->peers);
> > -                   rdata->event = RPORT_EV_NONE;
> > -                   mutex_unlock(&rdata->rp_mutex);
> > -                   mutex_unlock(&lport->disc.disc_mutex);
> > -           }
> > -
> >             if (rport_ops && rport_ops->event_callback) {
> >                     FC_RPORT_DBG(rdata, "callback ev %d\n", event);
> >                     rport_ops->event_callback(lport, rdata, event);
> > @@ -336,13 +316,34 @@ static void fc_rport_work(struct work_struct *work)
> >                     mutex_unlock(&rdata->rp_mutex);
> >                     fc_remote_port_delete(rport);
> >             }
> > -           if (restart) {
> > -                   mutex_lock(&rdata->rp_mutex);
> > -                   FC_RPORT_DBG(rdata, "work restart\n");
> > -                   fc_rport_enter_plogi(rdata);
> > +
> > +           mutex_lock(&lport->disc.disc_mutex);
> > +           mutex_lock(&rdata->rp_mutex);
> > +           if (rdata->rp_state == RPORT_ST_DELETE) {
> > +                   if (port_id == FC_FID_DIR_SERV) {
> > +                           rdata->event = RPORT_EV_NONE;
> > +                           mutex_unlock(&rdata->rp_mutex);
> > +                   } else if (rdata->flags & FC_RP_STARTED) {
> > +                           rdata->event = RPORT_EV_NONE;
> > +                           FC_RPORT_DBG(rdata, "work restart\n");
> > +                           fc_rport_enter_plogi(rdata);
> > +                           mutex_unlock(&rdata->rp_mutex);
> > +                   } else {
> > +                           FC_RPORT_DBG(rdata, "work delete\n");
> > +                           list_del(&rdata->peers);
> > +                           mutex_unlock(&rdata->rp_mutex);
> > +                           kref_put(&rdata->kref, lport->tt.rport_destroy);
> > +                   }
> > +           } else {
> > +                   /*
> > +                    * Re-open for events.  Reissue READY event if ready.
> > +                    */
> > +                   rdata->event = RPORT_EV_NONE;
> > +                   if (rdata->rp_state == RPORT_ST_READY)
> > +                           fc_rport_enter_ready(rdata);
> >                     mutex_unlock(&rdata->rp_mutex);
> > -           } else
> > -                   kref_put(&rdata->kref, lport->tt.rport_destroy);
> > +           }
> > +           mutex_unlock(&lport->disc.disc_mutex);
> >             break;
> >  
> >     default:
> > @@ -367,16 +368,14 @@ int fc_rport_login(struct fc_rport_priv *rdata)
> >  {
> >     mutex_lock(&rdata->rp_mutex);
> >  
> > +   rdata->flags |= FC_RP_STARTED;
> >     switch (rdata->rp_state) {
> >     case RPORT_ST_READY:
> >             FC_RPORT_DBG(rdata, "ADISC port\n");
> >             fc_rport_enter_adisc(rdata);
> >             break;
> > -   case RPORT_ST_RESTART:
> > -           break;
> >     case RPORT_ST_DELETE:
> >             FC_RPORT_DBG(rdata, "Restart deleted port\n");
> > -           fc_rport_state_enter(rdata, RPORT_ST_RESTART);
> >             break;
> >     default:
> >             FC_RPORT_DBG(rdata, "Login to port\n");
> > @@ -431,15 +430,12 @@ int fc_rport_logoff(struct fc_rport_priv *rdata)
> >  
> >     FC_RPORT_DBG(rdata, "Remove port\n");
> >  
> > +   rdata->flags &= ~FC_RP_STARTED;
> >     if (rdata->rp_state == RPORT_ST_DELETE) {
> >             FC_RPORT_DBG(rdata, "Port in Delete state, not removing\n");
> >             goto out;
> >     }
> > -
> > -   if (rdata->rp_state == RPORT_ST_RESTART)
> > -           FC_RPORT_DBG(rdata, "Port in Restart state, deleting\n");
> > -   else
> > -           fc_rport_enter_logo(rdata);
> > +   fc_rport_enter_logo(rdata);
> >  
> >     /*
> >      * Change the state to Delete so that we discard
> > @@ -503,7 +499,6 @@ static void fc_rport_timeout(struct work_struct *work)
> >     case RPORT_ST_READY:
> >     case RPORT_ST_INIT:
> >     case RPORT_ST_DELETE:
> > -   case RPORT_ST_RESTART:
> >             break;
> >     }
> >  
> > @@ -527,6 +522,7 @@ static void fc_rport_error(struct fc_rport_priv *rdata, 
> > struct fc_frame *fp)
> >     switch (rdata->rp_state) {
> >     case RPORT_ST_PLOGI:
> >     case RPORT_ST_LOGO:
> > +           rdata->flags &= ~FC_RP_STARTED;
> >             fc_rport_enter_delete(rdata, RPORT_EV_FAILED);
> >             break;
> >     case RPORT_ST_RTV:
> > @@ -537,7 +533,6 @@ static void fc_rport_error(struct fc_rport_priv *rdata, 
> > struct fc_frame *fp)
> >             fc_rport_enter_logo(rdata);
> >             break;
> >     case RPORT_ST_DELETE:
> > -   case RPORT_ST_RESTART:
> >     case RPORT_ST_READY:
> >     case RPORT_ST_INIT:
> >             break;
> > @@ -1392,7 +1387,6 @@ static void fc_rport_recv_plogi_req(struct fc_lport 
> > *lport,
> >             break;
> >     case RPORT_ST_DELETE:
> >     case RPORT_ST_LOGO:
> > -   case RPORT_ST_RESTART:
> >             FC_RPORT_DBG(rdata, "Received PLOGI in state %s - send busy\n",
> >                          fc_rport_state(rdata));
> >             mutex_unlock(&rdata->rp_mutex);
> > @@ -1629,13 +1623,6 @@ static void fc_rport_recv_logo_req(struct fc_lport 
> > *lport,
> >                          fc_rport_state(rdata));
> >  
> >             fc_rport_enter_delete(rdata, RPORT_EV_LOGO);
> > -
> > -           /*
> > -            * If the remote port was created due to discovery, set state
> > -            * to log back in.  It may have seen a stale RSCN about us.
> > -            */
> > -           if (rdata->disc_id)
> > -                   fc_rport_state_enter(rdata, RPORT_ST_RESTART);
> >             mutex_unlock(&rdata->rp_mutex);
> >     } else
> >             FC_RPORT_ID_DBG(lport, sid,
> > diff --git a/include/scsi/libfc.h b/include/scsi/libfc.h
> > index 7495c0b..db54c4a 100644
> > --- a/include/scsi/libfc.h
> > +++ b/include/scsi/libfc.h
> > @@ -104,7 +104,6 @@ enum fc_disc_event {
> >   * @RPORT_ST_LOGO:    Remote port logout (LOGO) sent
> >   * @RPORT_ST_ADISC:   Discover Address sent
> >   * @RPORT_ST_DELETE:  Remote port being deleted
> > - * @RPORT_ST_RESTART: Remote port being deleted and will restart
> >  */
> >  enum fc_rport_state {
> >     RPORT_ST_INIT,
> > @@ -115,7 +114,6 @@ enum fc_rport_state {
> >     RPORT_ST_LOGO,
> >     RPORT_ST_ADISC,
> >     RPORT_ST_DELETE,
> > -   RPORT_ST_RESTART,
> >  };
> >  
> >  /**
> > @@ -173,6 +171,7 @@ struct fc_rport_libfc_priv {
> >     u16                        flags;
> >     #define FC_RP_FLAGS_REC_SUPPORTED       (1 << 0)
> >     #define FC_RP_FLAGS_RETRY               (1 << 1)
> > +   #define FC_RP_STARTED                   (1 << 2)
> >     unsigned int               e_d_tov;
> >     unsigned int               r_a_tov;
> >  };
> > @@ -185,7 +184,7 @@ struct fc_rport_libfc_priv {
> >   * @rp_state:       Enumeration that tracks progress of PLOGI, PRLI,
> >   *                  and RTV exchanges
> >   * @ids:            The remote port identifiers and roles
> > - * @flags:          REC and RETRY supported flags
> > + * @flags:          STARTED, REC and RETRY_SUPPORTED flags
> >   * @max_seq:        Maximum number of concurrent sequences
> >   * @disc_id:        The discovery identifier
> >   * @maxframe_size:  The maximum frame size
> > 
> > 
> > _______________________________________________
> > devel mailing list
> > [email protected]
> > http://www.open-fcoe.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-fcoe.org/mailman/listinfo/devel
> 



_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to