Bhanu Gollapudi wrote:
> On Tue, 2010-05-25 at 09:22 -0700, Joe Eykholt wrote:
>> Joe Eykholt wrote:
>>> This patch somewhat combines two fixes to remote port handing in libfc.
>>>
>>> The first problem was that rport work could be queued on a deleted
>>> and freed rport.  This is handled by not resetting rdata->event
>>> ton NONE if the rdata is about to be deleted.
>>>
>>> However, that fix led to the second problem, described by
>>> Bhanu Gollapudi, as follows:
>>>> Here is the sequence of events. T1 is first LOGO receive thread, T2 is
>>>> fc_rport_work() scheduled by T1 and T3 is second LOGO receive thread and
>>>> T4 is fc_rport_work scheduled by T3.
>>>>
>>>> 1. (T1)Received 1st LOGO in state Ready
>>>> 2. (T1)Delete port & enter to RESTART state.
>>>> 3. (T1)schdule event_work, since event is RPORT_EV_NONE.
>>>> 4. (T1)set event = RPORT_EV_LOGO
>>>> 5. (T1)Enter RESTART state as disc_id is set.
>>>> 6. (T2)remember to PLOGI, and set event = RPORT_EV_NONE
>>>> 6. (T3)Received 2nd LOGO
>>>> 7. (T3)Delete Port & enter to RESTART state.
>>>> 8. (T3)schedule event_work, since event is RPORT_EV_NONE.
>>>> 9. (T3)Enter RESTART state as disc_id is set.
>>>> 9. (T3)set event = RPORT_EV_LOGO
>>>> 10.(T2)work restart, enter PLOGI state and issues PLOGI
>>>> 11.(T4)Since state is not RESTART anymore, restart is not set, and the
>>>> event is not reset to RPORT_EV_NONE. (current event is RPORT_EV_LOGO).
>>>> 12. Now, PLOGI succeeds and fc_rport_enter_ready() will not schedule
>>>> event_work, and hence the rport will never be created, eventually losing
>>>> the target after dev_loss_tmo.
>>> So, the problem here is that we were tracking the desire for
>>> the rport be restarted by state RESTART, which was otherwise
>>> equivalent to DELETE.  A contributing factor is that we dropped
>>> the lock between steps 6 and 10 in thread T2, which allows the
>>> state to change, and we didn't completely re-evaluate then.
>>>
>>> This is hopefully corrected by the following minor redesign:
>>>
>>> Simplify the rport restart logic by making the decision to
>>> restart after deleting the transport rport.  That decision
>>> is based on a new STARTED flag that indicates fc_rport_login()
>>> has been called and fc_rport_logoff() has not been called
>>> since then.  This replaces the need for the RESTART state.
>>>
>>> Only restart if the rdata is still in DELETED state
>>> and only if it still has the STARTED flag set.
>>>
>>> Also now, since we clear the event code much later in the
>>> work thread, allow for the possibility that the rport may
>>> have become READY again via incoming PLOGI, and if so,
>>> queue another event to handle that.
>>>
>>> In the problem scenario, the second LOGO received will
>>> cause the LOGO event to occur again.
>> We've discovered a problem with this patch.  If a PRLI or
>> other request in the state machine fails we could restart
>> indefinitely.
>>
>> I'm not sure how to handle this yet.  Perhaps instead of
>> going thru the work item and delete at that point we should
>> just enter an error state until login is called again,
>> either by discovery or a link flap.   Suggestions welcome.
> 
> Joe, correct me if I'm wrong, but shouldn't we handle it 
> similar to PLOGI failure where we clear the FC_RP_STARTED
> flag and then LOGO from that rport.

I thought of that, but I was worried that a PRLI failure could
be due to not being logged on, so I didn't want to LOGO and
not try to restart in that case.   We had an issue where
we lost the root disk in a boot-from-SAN setup that way.
On the other hand, we don't want to go through that restart
loop indefinitely either.

I have a followup patch that addresses this.  It simply
adds a new retry count for the restarts.

So, I would keep this patch and put the one I'm about to
send on top of it.

        Cheers,
        Joe


> 
>> The other three patches in this series are unrelated and
>> could go ahead.
>>
>>      Joe
>>
>>> Reported-by: Bhanu Gollapudi <[email protected]>
>>> Signed-off-by: Joe Eykholt <[email protected]>
>>> ---
>>>  drivers/scsi/libfc/fc_rport.c |   75 
>>> +++++++++++++++++------------------------
>>>  include/scsi/libfc.h          |    5 +--
>>>  2 files changed, 33 insertions(+), 47 deletions(-)
>>>
>>>
>>> diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
>>> index dae67ed..96b8df5 100644
>>> --- a/drivers/scsi/libfc/fc_rport.c
>>> +++ b/drivers/scsi/libfc/fc_rport.c
>>> @@ -89,7 +89,6 @@ static const char *fc_rport_state_names[] = {
>>>     [RPORT_ST_LOGO] = "LOGO",
>>>     [RPORT_ST_ADISC] = "ADISC",
>>>     [RPORT_ST_DELETE] = "Delete",
>>> -   [RPORT_ST_RESTART] = "Restart",
>>>  };
>>>  
>>>  /**
>>> @@ -246,7 +245,6 @@ static void fc_rport_work(struct work_struct *work)
>>>     struct fc_rport_operations *rport_ops;
>>>     struct fc_rport_identifiers ids;
>>>     struct fc_rport *rport;
>>> -   int restart = 0;
>>>  
>>>     mutex_lock(&rdata->rp_mutex);
>>>     event = rdata->event;
>>> @@ -298,24 +296,6 @@ static void fc_rport_work(struct work_struct *work)
>>>             port_id = rdata->ids.port_id;
>>>             mutex_unlock(&rdata->rp_mutex);
>>>  
>>> -           if (port_id != FC_FID_DIR_SERV) {
>>> -                   /*
>>> -                    * We must drop rp_mutex before taking disc_mutex.
>>> -                    * Re-evaluate state to allow for restart.
>>> -                    * A transition to RESTART state must only happen
>>> -                    * while disc_mutex is held and rdata is on the list.
>>> -                    */
>>> -                   mutex_lock(&lport->disc.disc_mutex);
>>> -                   mutex_lock(&rdata->rp_mutex);
>>> -                   if (rdata->rp_state == RPORT_ST_RESTART)
>>> -                           restart = 1;
>>> -                   else
>>> -                           list_del(&rdata->peers);
>>> -                   rdata->event = RPORT_EV_NONE;
>>> -                   mutex_unlock(&rdata->rp_mutex);
>>> -                   mutex_unlock(&lport->disc.disc_mutex);
>>> -           }
>>> -
>>>             if (rport_ops && rport_ops->event_callback) {
>>>                     FC_RPORT_DBG(rdata, "callback ev %d\n", event);
>>>                     rport_ops->event_callback(lport, rdata, event);
>>> @@ -336,13 +316,34 @@ static void fc_rport_work(struct work_struct *work)
>>>                     mutex_unlock(&rdata->rp_mutex);
>>>                     fc_remote_port_delete(rport);
>>>             }
>>> -           if (restart) {
>>> -                   mutex_lock(&rdata->rp_mutex);
>>> -                   FC_RPORT_DBG(rdata, "work restart\n");
>>> -                   fc_rport_enter_plogi(rdata);
>>> +
>>> +           mutex_lock(&lport->disc.disc_mutex);
>>> +           mutex_lock(&rdata->rp_mutex);
>>> +           if (rdata->rp_state == RPORT_ST_DELETE) {
>>> +                   if (port_id == FC_FID_DIR_SERV) {
>>> +                           rdata->event = RPORT_EV_NONE;
>>> +                           mutex_unlock(&rdata->rp_mutex);
>>> +                   } else if (rdata->flags & FC_RP_STARTED) {
>>> +                           rdata->event = RPORT_EV_NONE;
>>> +                           FC_RPORT_DBG(rdata, "work restart\n");
>>> +                           fc_rport_enter_plogi(rdata);
>>> +                           mutex_unlock(&rdata->rp_mutex);
>>> +                   } else {
>>> +                           FC_RPORT_DBG(rdata, "work delete\n");
>>> +                           list_del(&rdata->peers);
>>> +                           mutex_unlock(&rdata->rp_mutex);
>>> +                           kref_put(&rdata->kref, lport->tt.rport_destroy);
>>> +                   }
>>> +           } else {
>>> +                   /*
>>> +                    * Re-open for events.  Reissue READY event if ready.
>>> +                    */
>>> +                   rdata->event = RPORT_EV_NONE;
>>> +                   if (rdata->rp_state == RPORT_ST_READY)
>>> +                           fc_rport_enter_ready(rdata);
>>>                     mutex_unlock(&rdata->rp_mutex);
>>> -           } else
>>> -                   kref_put(&rdata->kref, lport->tt.rport_destroy);
>>> +           }
>>> +           mutex_unlock(&lport->disc.disc_mutex);
>>>             break;
>>>  
>>>     default:
>>> @@ -367,16 +368,14 @@ int fc_rport_login(struct fc_rport_priv *rdata)
>>>  {
>>>     mutex_lock(&rdata->rp_mutex);
>>>  
>>> +   rdata->flags |= FC_RP_STARTED;
>>>     switch (rdata->rp_state) {
>>>     case RPORT_ST_READY:
>>>             FC_RPORT_DBG(rdata, "ADISC port\n");
>>>             fc_rport_enter_adisc(rdata);
>>>             break;
>>> -   case RPORT_ST_RESTART:
>>> -           break;
>>>     case RPORT_ST_DELETE:
>>>             FC_RPORT_DBG(rdata, "Restart deleted port\n");
>>> -           fc_rport_state_enter(rdata, RPORT_ST_RESTART);
>>>             break;
>>>     default:
>>>             FC_RPORT_DBG(rdata, "Login to port\n");
>>> @@ -431,15 +430,12 @@ int fc_rport_logoff(struct fc_rport_priv *rdata)
>>>  
>>>     FC_RPORT_DBG(rdata, "Remove port\n");
>>>  
>>> +   rdata->flags &= ~FC_RP_STARTED;
>>>     if (rdata->rp_state == RPORT_ST_DELETE) {
>>>             FC_RPORT_DBG(rdata, "Port in Delete state, not removing\n");
>>>             goto out;
>>>     }
>>> -
>>> -   if (rdata->rp_state == RPORT_ST_RESTART)
>>> -           FC_RPORT_DBG(rdata, "Port in Restart state, deleting\n");
>>> -   else
>>> -           fc_rport_enter_logo(rdata);
>>> +   fc_rport_enter_logo(rdata);
>>>  
>>>     /*
>>>      * Change the state to Delete so that we discard
>>> @@ -503,7 +499,6 @@ static void fc_rport_timeout(struct work_struct *work)
>>>     case RPORT_ST_READY:
>>>     case RPORT_ST_INIT:
>>>     case RPORT_ST_DELETE:
>>> -   case RPORT_ST_RESTART:
>>>             break;
>>>     }
>>>  
>>> @@ -527,6 +522,7 @@ static void fc_rport_error(struct fc_rport_priv *rdata, 
>>> struct fc_frame *fp)
>>>     switch (rdata->rp_state) {
>>>     case RPORT_ST_PLOGI:
>>>     case RPORT_ST_LOGO:
>>> +           rdata->flags &= ~FC_RP_STARTED;
>>>             fc_rport_enter_delete(rdata, RPORT_EV_FAILED);
>>>             break;
>>>     case RPORT_ST_RTV:
>>> @@ -537,7 +533,6 @@ static void fc_rport_error(struct fc_rport_priv *rdata, 
>>> struct fc_frame *fp)
>>>             fc_rport_enter_logo(rdata);
>>>             break;
>>>     case RPORT_ST_DELETE:
>>> -   case RPORT_ST_RESTART:
>>>     case RPORT_ST_READY:
>>>     case RPORT_ST_INIT:
>>>             break;
>>> @@ -1392,7 +1387,6 @@ static void fc_rport_recv_plogi_req(struct fc_lport 
>>> *lport,
>>>             break;
>>>     case RPORT_ST_DELETE:
>>>     case RPORT_ST_LOGO:
>>> -   case RPORT_ST_RESTART:
>>>             FC_RPORT_DBG(rdata, "Received PLOGI in state %s - send busy\n",
>>>                          fc_rport_state(rdata));
>>>             mutex_unlock(&rdata->rp_mutex);
>>> @@ -1629,13 +1623,6 @@ static void fc_rport_recv_logo_req(struct fc_lport 
>>> *lport,
>>>                          fc_rport_state(rdata));
>>>  
>>>             fc_rport_enter_delete(rdata, RPORT_EV_LOGO);
>>> -
>>> -           /*
>>> -            * If the remote port was created due to discovery, set state
>>> -            * to log back in.  It may have seen a stale RSCN about us.
>>> -            */
>>> -           if (rdata->disc_id)
>>> -                   fc_rport_state_enter(rdata, RPORT_ST_RESTART);
>>>             mutex_unlock(&rdata->rp_mutex);
>>>     } else
>>>             FC_RPORT_ID_DBG(lport, sid,
>>> diff --git a/include/scsi/libfc.h b/include/scsi/libfc.h
>>> index 7495c0b..db54c4a 100644
>>> --- a/include/scsi/libfc.h
>>> +++ b/include/scsi/libfc.h
>>> @@ -104,7 +104,6 @@ enum fc_disc_event {
>>>   * @RPORT_ST_LOGO:    Remote port logout (LOGO) sent
>>>   * @RPORT_ST_ADISC:   Discover Address sent
>>>   * @RPORT_ST_DELETE:  Remote port being deleted
>>> - * @RPORT_ST_RESTART: Remote port being deleted and will restart
>>>  */
>>>  enum fc_rport_state {
>>>     RPORT_ST_INIT,
>>> @@ -115,7 +114,6 @@ enum fc_rport_state {
>>>     RPORT_ST_LOGO,
>>>     RPORT_ST_ADISC,
>>>     RPORT_ST_DELETE,
>>> -   RPORT_ST_RESTART,
>>>  };
>>>  
>>>  /**
>>> @@ -173,6 +171,7 @@ struct fc_rport_libfc_priv {
>>>     u16                        flags;
>>>     #define FC_RP_FLAGS_REC_SUPPORTED       (1 << 0)
>>>     #define FC_RP_FLAGS_RETRY               (1 << 1)
>>> +   #define FC_RP_STARTED                   (1 << 2)
>>>     unsigned int               e_d_tov;
>>>     unsigned int               r_a_tov;
>>>  };
>>> @@ -185,7 +184,7 @@ struct fc_rport_libfc_priv {
>>>   * @rp_state:       Enumeration that tracks progress of PLOGI, PRLI,
>>>   *                  and RTV exchanges
>>>   * @ids:            The remote port identifiers and roles
>>> - * @flags:          REC and RETRY supported flags
>>> + * @flags:          STARTED, REC and RETRY_SUPPORTED flags
>>>   * @max_seq:        Maximum number of concurrent sequences
>>>   * @disc_id:        The discovery identifier
>>>   * @maxframe_size:  The maximum frame size
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> [email protected]
>>> http://www.open-fcoe.org/mailman/listinfo/devel
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-fcoe.org/mailman/listinfo/devel
>>
> 
> 
> 

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to