>-----Original Message----- >From: Mike Christie [mailto:[EMAIL PROTECTED] >Sent: Thursday, July 31, 2008 10:37 AM >To: Dev, Vasu >Cc: [email protected] >Subject: Re: [Open-FCoE] [PATCH 1/1] libfc: remove shared tag map and fix >timer usage in fc_fcp.c (take 2) > >Dev, Vasu wrote: >>> -----Original Message----- >>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >On >>> Behalf Of [EMAIL PROTECTED] >>> Sent: Wednesday, July 30, 2008 1:44 AM >>> To: [email protected] >>> Subject: [Open-FCoE] [PATCH 1/1] libfc: remove shared tag map and fix >timer >>> usage in fc_fcp.c (take 2) >>> >>> From: Mike Christie <[EMAIL PROTECTED]> >> So one possible solution can be adding exch lock/unlock functions >> similar to recently added fc pkt lock/unlock functions by you and >> then call exch resp handler with exch lock held. This would also >> require ensuring other EM calls (e.g. fc_exch_done()) are not in >> call path to any circular exch locking. > >Yeah, I was thinking that might get difficult if it is a spin lock with >the code how it is now. Another problem besides the circular locking is >that he exch resp handler may sleep. For example when we fc_io_compl >calls del_timer_sync. The sleeping problem could be fixed though. >
Yeah this will also cause sleep with lock held issues in resp function and this sleeping problem would need fix similar to handling host_lock in queuecommand path. >I was tinkering with something like del_timer_sync where the fc_fcp.c >resp function would hold the pkt lock, set some state bit indicating it >is cleaning up the exchange, drop the pkt lock then call a fc_exch.c >function to clean up the exachnge. The fc_exch.c function would then not >return until the resp function was done running. For the fc_fcp resp >functions it would get run, grab the pkt lock then see the new state bit >indicating that it should just exit and not process the fsp. All this >seems maybe more complicated than what you propose though :( > > This sounds better approach and will also eliminate need to call resp function with exch lock held. I'll try to fix this race along this approach. However if I can fix fc_exch.c to better co-ordinate between exch_done, exch timeout, exch abort and resp calling then that would eliminate upper layer from any tracking as suggested above by a state bit in fsp pkt. >> >>> This will be fixed in a different patch. >>> >> >> I can work on above suggested solution unless you are in the middle >> of this different patch. > >Go ahead. I have just been timkering with different approaches and >learning the fc_exch.c code. > OK. >> >> As you mentioned that there were issues like this prior to shared >> tag map fix in libfc, therefore not sure on reverting shared tag map >> fix from libfc with added over head of maintaining link list of > >I think we agree we need a way to loop over running commands for error >handling operations. True. > Are you saying you want to keep the shared tag map >way of looking up commands? Thought so, I wanted libfc fast path free from keeping track of running/outstanding commands for only error handling and instead use any scsi-ml maintained command tracking. Since now you have better solutions for this in pipeline as discussed in other mails (using scsi_error.c list or exchange manager reset function), therefore shared tag map in libfc not required any more and this patch is good to go. <snip> >>> + spin_lock(&fsp->scsi_pkt_lock); >>> + if (fsp->state & FC_SRB_COMPL) >>> + goto unlock; >>> + >> >> The fsp packet might be freed by this time though FC_SCSI_REC_TOV for > >How will it get freed? I thought this fsp is only freed when we do this >in fc_device_reset: > > sp->state = FC_SRB_FREE; > fc_fcp_pkt_free(sp); > > >If fc_lun_reset calls fc_lun_reset_send and the frame allocation always >fails so that timer is getting rearmed over and over, then the call to > >wait_for_completion_timeout(&fsp->tm_done, FC_SCSI_TM_TOV) >would timeout and return but we do this: > > spin_lock(&fsp->scsi_pkt_lock); > fsp->state |= FC_SRB_COMPL; > spin_unlock(&fsp->scsi_pkt_lock); > > del_timer_sync(&fsp->timer); > >the del_timer_sync call will wait for fc_lun_reset_send to complete >before it returns or kill the pending timer then return. > Thanks for explanation, yeap this is not an issue. With this patch, the entire fc_lun_reset_send() has pkt lock held while sending frame down via fc_exch.c and network layer. The fc_exch.c can sleep in exch allocation, I need to fix that and that is a separate issue but in general what do you think on calling into fc_exch.c with or without any lock held ? _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
