>-----Original Message-----
>From: Mike Christie [mailto:[EMAIL PROTECTED]
>Sent: Friday, August 08, 2008 10:31 AM
>To: Dev, Vasu
>Cc: [email protected]
>Subject: Re: [Open-FCoE] [RFC PATCH 1/2] libfc: removed checks for
>timer_pending and then updating exch ref count.
>
>Mike Christie wrote:
>> Vasu Dev wrote:
>>> Updating exch ref count based on timer_pending check can mess
>>> up exch ref count if timer fires just after this check. Instead count
>>> on del/mod_timer() return values to update exch ref count.
>>>
>>
>> The timer parts look good.
>>
>>> @@ -288,11 +288,21 @@ static void fc_exch_release(struct fc_exch *ep)
>>>
>>>      if (atomic_dec_and_test(&ep->ex_refcnt)) {
>>>              WARN_ON(!ep->esb_stat & ESB_ST_COMPLETE);
>>> -            del_timer(&ep->ex_timer);
>>> +            WARN_ON(timer_pending(&ep->ex_timer));
>>>              mp = ep->em;
>>> +            spin_lock_bh(&mp->em_lock);
>>> +            /*
>>> +             * ensure ep->ex_refcnt is still zero
>>> +             * if not then skip freeing exchange
>>> +             * since some new incoming frame took
>>> +             * hold of this, which is unlikely.
>>> +             */
>>> +            if (atomic_read(&ep->ex_refcnt)) {
>>> +                    spin_unlock_bh(&mp->em_lock);
>>> +                    return;
>>
>>
>> This looks weird, because it extended the locking around a access to a
>> new ex->ref_count read. What was racing? Is it supposed to handle if we
>> got a response for an exchange, but then a response for a ABTS is coming
>> in right after it so the two are being processed on mutliple recv
>threads?
>>
>> If so it seems like thread0 could still do what looks like the final
>> release so refcount is 0. Then on thread1 the abts resp could have
>> already passed the fc_exch_find call in fc_exch_recv_bls. Then thread0
>> could grab the lock above and see refcount is zero. But then thread1
>> could just call fc_exch_hold and increase the count and think everything
>> is ok but fc_exch_release is calling the kemem cache free function.
>>
>
>Actually I am wrong. I was looking at the fc_exch_abts_resp call to
>fc_exch_hold which does not look like it is needed if the caller grabs a
>ref right?

Correct, can be optimized but extra hold & release won't cause any issue. I'll 
fix that.

>It looks the abts resp path is ok. fc_exch_recv_bls call to
>fc_exch_find will have grabbed a ref to the ep and your code would avoid
>the race

Exactly, this is the race I fixed.

>, and then we do not need the fc_exch_hold call in
>fc_exch_abts_resp. Is the abort race type of problem you were trying to
>prevent?

I fixed this very unlikely race for any frame for the exch in RX path, the
exch ref bumped up by fc_exch_find() while fc_exch_release() passed
atomic zero exch ref check but grabbed mp->em_lock after fc_exch_find()
released mp->em_lock.

>
>The locking and refcount use is really weird now though. Would it be
>cleaner or more standard to just say that you need to always grab the
>lock when accessing the ex_refcount instead of sometimes doing it and
>sometimes not (too confusing).

Currently exch ref count can be changed(inc/dec) without having need to hold
exch lock, but this can be revised as suggested above to always have exch
lock for changing exch ref. However as you mentioned in other email that
exch ref and locking is still nicer than fsp pkt locking and doesn't need
expensive del_timer_sync() as used in FCP.
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to