>-----Original Message-----
>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
>Behalf Of [EMAIL PROTECTED]
>Sent: Wednesday, July 30, 2008 1:44 AM
>To: [email protected]
>Subject: [Open-FCoE] [PATCH 1/1] libfc: remove shared tag map and fix timer
>usage in fc_fcp.c (take 2)
>
>From: Mike Christie <[EMAIL PROTECTED]>
>
>This patch removes the shared tag map. It is only being used
>to loop over running commands which is better suited with
>a list of commands. This patch does not fix the issue with
>the original outstanding array use and with the shared tag patch,
>where the resp function can be running while the fc_io_compl
>is freeing pkt.
Nice catch, it has been like this for quite some time of calling
exch resp handler while exch can be done/freed by FCP error handling.
So there is race, however window of race is very small since FCP error handling
works with fcp pkt lock held.
So one possible solution can be adding exch lock/unlock functions
similar to recently added fc pkt lock/unlock functions by you and
then call exch resp handler with exch lock held. This would also
require ensuring other EM calls (e.g. fc_exch_done()) are not in
call path to any circular exch locking.
> This will be fixed in a different patch.
>
I can work on above suggested solution unless you are in the middle
of this different patch.
As you mentioned that there were issues like this prior to shared
tag map fix in libfc, therefore not sure on reverting shared tag map
fix from libfc with added over head of maintaining link list of
outstanding commands in libfc. The BUG_ON in shared tag map is
likely side effect of other pre-existing bugs like this.
>This patch does fix several issues with the timer usage
>in fc_fcp.c in the same code path I changed including:
>- Use setup_timer and mod_timer.
>- Don't call del_timer_sync with a spin_lock held.
>- del_timer_sync does not handle timer functions that
>rearm themselves. The user must add some synchrinization for this.
>- The lun reset code could set a timer if the frame allocation fails, but
>it was not deleting it.
>
The lun reset code timer on frame allocation fails is not fully fixed,
See my additional comments below.
<snip>
>-static void fc_lun_reset_send(struct fc_fcp_pkt *fsp)
>+static void fc_lun_reset_send(unsigned long data)
> {
>+ struct fc_fcp_pkt *fsp = (struct fc_fcp_pkt *)data;
> const size_t len = sizeof(fsp->cdb_cmd);
> struct fc_lport *lp = fsp->lp;
> struct fc_frame *fp;
>@@ -1061,6 +1072,10 @@ static void fc_lun_reset_send(struct fc_fcp_pkt
>*fsp)
> struct fc_rport *rport;
> struct fc_rport_libfc_priv *rp;
>
>+ spin_lock(&fsp->scsi_pkt_lock);
>+ if (fsp->state & FC_SRB_COMPL)
>+ goto unlock;
>+
The fsp packet might be freed by this time though FC_SCSI_REC_TOV for
fc_lun_reset_send is smaller than FC_SCSI_TM_TOV used by caller
fc_lun_reset(), but fc_lun_reset_send() may get called multiple times on
each frame alloc failure. So may be fc_lun_reset_send() should keep retry
count not exceeding FC_SCSI_TM_TOV.
> fp = fc_frame_alloc(lp, len);
> if (!fp)
> goto retry;
>@@ -1078,15 +1093,17 @@ static void fc_lun_reset_send(struct fc_fcp_pkt
>*fsp)
>
> if (sp) {
> fsp->seq_ptr = sp;
>- return;
>+ goto unlock;
> }
> /*
> * Exchange or frame allocation failed. Set timer and retry.
> */
> fc_frame_free(fp);
> retry:
>- fsp->timer.function = (void (*)(unsigned long))fc_lun_reset_send;
>+ setup_timer(&fsp->timer, fc_lun_reset_send, (unsigned long)fsp);
> fc_fcp_timer_set(fsp, FC_SCSI_REC_TOV);
>+unlock:
>+ spin_unlock(&fsp->scsi_pkt_lock);
> }
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel