Bhanu Gollapudi wrote:
> On Thu, 2010-05-13 at 10:29 -0700, Joe Eykholt wrote:
>> Bhanu Gollapudi wrote:
>>> When a target sends multiple back-to-back LOGO's, we try to process
>>> all of them irrespective of the status of processing of prior LOGO
>>> which results in oops in fc_rport_work :
>>>  general protection fault: 0000 [#1] SMP
>>>  last sysfs file: /sys/block/sdao/dev
>>>  CPU 0
>>>  Pid: 6870, comm: fc_rport_eq Not tainted 
>>> 2.6.33-rc4-fcoe-next-00286-g07cca55
>>>  RIP: 0010:[<ffffffffa031aaf4>]  [<ffffffffa031aaf4>] 
>>> fc_rport_work+0x288/0x3d4 [libfc]
>>>  RSP: 0018:ffff88004695fe00  EFLAGS: 00010202
>>>  RAX: dead000000200200 RBX: ffff880046958000 RCX: ffff8800469580e0
>>>  RDX: dead000000100100 RSI: dead000000200200 RDI: dead000000100100
>>>  RBP: ffff88004695fe70 R08: ffff88004695fc40 R09: 0000000000000000
>>>  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000292cef
>>>  R13: 0000000000000004 R14: 0000000000000000 R15: ffff880046958048
>>>  FS:  0000000000000000(0000) GS:ffff880001e00000(0000) 
>>> knlGS:0000000000000000
>>>  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>  CR2: 0000000041bb7a08 CR3: 0000000078d8f000 CR4: 00000000000006f0
>>>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>  Process fc_rport_eq (pid: 6870, threadinfo ffff88004695e000, task 
>>> ffff88005d78afa0)
>>>  Stack:
>>>   0000000000000000 ffff880046958458 ffff880046958400 000000007f7e2fa0
>>>  <0> ffff88005d78b328 00000001000c55d4 ffff88004695fe40 ffffffff810a12e1
>>>  <0> ffff88004695fe60 ffff880046e99d30 ffff8800469580f0 ffffe8ffffc12a80
>>>  Call Trace:
>>>   [<ffffffff810a12e1>] ? trace_hardirqs_off+0x9/0x20
>>>   [<ffffffffa031a86c>] ? fc_rport_work+0x0/0x3d4 [libfc]
>>>   [<ffffffff81056d14>] worker_thread+0x149/0x1e5
>>>   [<ffffffff81059c27>] ? autoremove_wake_function+0x0/0x3d
>>>   [<ffffffff81056bcb>] ? worker_thread+0x0/0x1e5
>>>   [<ffffffff810598da>] kthread+0x6e/0x76
>>>   [<ffffffff81003a14>] kernel_thread_helper+0x4/0x10
>>>   [<ffffffff814a4369>] ? restore_args+0x0/0x30
>>>   [<ffffffff8105986c>] ? kthread+0x0/0x76
>>>   [<ffffffff81003a10>] ? kernel_thread_helper+0x0/0x10
>>>
>>> To avoid this drop all LOGO's received while a prior LOGO is being processed
>>>
>>> Signed-off-by: Bhanu Prakash Gollapudi <[email protected]>
>>> ---
>>>  drivers/scsi/libfc/fc_rport.c |   18 ++++++++++++------
>>>  1 files changed, 12 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
>>> index f179ffc..fba5218 100644
>>> --- a/drivers/scsi/libfc/fc_rport.c
>>> +++ b/drivers/scsi/libfc/fc_rport.c
>>> @@ -1621,14 +1621,20 @@ static void fc_rport_recv_logo_req(struct fc_lport 
>>> *lport,
>>>             FC_RPORT_DBG(rdata, "Received LOGO request while in state %s\n",
>>>                          fc_rport_state(rdata));
>>>  
>>> -           fc_rport_enter_delete(rdata, RPORT_EV_LOGO);
>>> -
>>>             /*
>>> -            * If the remote port was created due to discovery, set state
>>> -            * to log back in.  It may have seen a stale RSCN about us.
>>> +            * Drop all further LOGO's while a prior LOGO is being
>>> +            * processed
>>>              */
>>> -           if (rdata->disc_id)
>>> -                   fc_rport_state_enter(rdata, RPORT_ST_RESTART);
>>> +           if (rdata->rp_state != RPORT_ST_RESTART) {
>>> +                   fc_rport_enter_delete(rdata, RPORT_EV_LOGO);
>>> +
>>> +                   /*
>>> +                    * If the remote port was created due to discovery,
>>> +                    * set state to log back in.  It may have seen a
>>> +                    * stale RSCN about us.
>>> +                    */
>>> +                   if (rdata->disc_id)
>>> +                           fc_rport_state_enter(rdata, RPORT_ST_RESTART);
>>> +           }
>>>             mutex_unlock(&rdata->rp_mutex);
>>>     } else
>>>             FC_RPORT_ID_DBG(lport, sid,
>> I've seen this symptom, too, and I agree this is one way its caused.
>> A more complete fix is to not set the rdata->event to RPORT_EV_NONE in
>> fc_rport_work() if the rdata is removed from the list.  That prevents
>> fc_rport_enter_delete() from rescheduling work on the rdata that's about
>> to be freed.
>>
>> I think that makes the above patch unnecessary.  Do you agree that that 
>> fixes it?
> 
> This can lead to a different problem. 
> 
> Here is the sequence of events. T1 is first LOGO receive thread, T2 is
> fc_rport_work() scheduled by T1 and T3 is second LOGO receive thread and
> T4 is fc_rport_work scheduled by T3.
> 
> 1. (T1)Received 1st LOGO in state Ready
> 2. (T1)Delete port & enter to RESTART state.
> 3. (T1)schdule event_work, since event is RPORT_EV_NONE.
> 4. (T1)set event = RPORT_EV_LOGO
> 5. (T1)Enter RESTART state as disc_id is set.
> 6. (T2)remember to PLOGI, and set event = RPORT_EV_NONE
> 6. (T3)Received 2nd LOGO
> 7. (T3)Delete Port & enter to RESTART state.
> 8. (T3)schedule event_work, since event is RPORT_EV_NONE.
> 9. (T3)Enter RESTART state as disc_id is set.
> 9. (T3)set event = RPORT_EV_LOGO
> 10.(T2)work restart, enter PLOGI state and issues PLOGI
> 11.(T4)Since state is not RESTART anymore, restart is not set, and the
> event is not reset to RPORT_EV_NONE. (current event is RPORT_EV_LOGO).
> 12. Now, PLOGI succeeds and fc_rport_enter_ready() will not schedule
> event_work, and hence the rport will never be created, eventually losing
> the target after dev_loss_tmo.

I agree.  That's a problem.

Either rport_work should check for more states, or we should indicate
the desire to restart with a flag instead of a state.  I think the flag
approach works better.

>> Here's my patch, cut & pasted so it may not apply, just for illustration, 
>> I'll
>> send to the alias separately:
>>
>> --- a/drivers/scsi/libfc/fc_rport.c
>> +++ b/drivers/scsi/libfc/fc_rport.c
>> @@ -307,11 +307,11 @@ static void fc_rport_work(struct work_struct *work)
>>                       */
>>                      mutex_lock(&lport->disc.disc_mutex);
>>                      mutex_lock(&rdata->rp_mutex);
>> -                    if (rdata->rp_state == RPORT_ST_RESTART)
>> +                    if (rdata->rp_state == RPORT_ST_RESTART) {
>>                              restart = 1;
>> -                    else
>> +                            rdata->event = RPORT_EV_NONE;
>> +                    } else
>>                              list_del(&rdata->peers);
>> -                    rdata->event = RPORT_EV_NONE;
>>                      mutex_unlock(&rdata->rp_mutex);
>>                      mutex_unlock(&lport->disc.disc_mutex);
>>
>>
>> ---
>>
>>      Thanks,
>>      Joe
>>
>>
>>
> 
> 
> 

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to