Mike Christie wrote:
> Vasu Dev wrote:
>> Updating exch ref count based on timer_pending check can mess
>> up exch ref count if timer fires just after this check. Instead count
>> on del/mod_timer() return values to update exch ref count.
>>
> 
> The timer parts look good.
> 
>> @@ -288,11 +288,21 @@ static void fc_exch_release(struct fc_exch *ep)
>>  
>>      if (atomic_dec_and_test(&ep->ex_refcnt)) {
>>              WARN_ON(!ep->esb_stat & ESB_ST_COMPLETE);
>> -            del_timer(&ep->ex_timer);
>> +            WARN_ON(timer_pending(&ep->ex_timer));
>>              mp = ep->em;
>> +            spin_lock_bh(&mp->em_lock);
>> +            /*
>> +             * ensure ep->ex_refcnt is still zero
>> +             * if not then skip freeing exchange
>> +             * since some new incoming frame took
>> +             * hold of this, which is unlikely.
>> +             */
>> +            if (atomic_read(&ep->ex_refcnt)) {
>> +                    spin_unlock_bh(&mp->em_lock);
>> +                    return;
> 
> 
> This looks weird, because it extended the locking around a access to a 
> new ex->ref_count read. What was racing? Is it supposed to handle if we 
> got a response for an exchange, but then a response for a ABTS is coming 
> in right after it so the two are being processed on mutliple recv threads?
> 
> If so it seems like thread0 could still do what looks like the final 
> release so refcount is 0. Then on thread1 the abts resp could have 
> already passed the fc_exch_find call in fc_exch_recv_bls. Then thread0 
> could grab the lock above and see refcount is zero. But then thread1 
> could just call fc_exch_hold and increase the count and think everything 
> is ok but fc_exch_release is calling the kemem cache free function.
> 

Actually I am wrong. I was looking at the fc_exch_abts_resp call to 
fc_exch_hold which does not look like it is needed if the caller grabs a 
ref right? It looks the abts resp path is ok. fc_exch_recv_bls call to 
fc_exch_find will have grabbed a ref to the ep and your code would avoid 
the race, and then we do not need the fc_exch_hold call in 
fc_exch_abts_resp. Is the abort race type of problem you were trying to 
prevent?

The locking and refcount use is really weird now though. Would it be 
cleaner or more standard to just say that you need to always grab the 
lock when accessing the ex_refcount instead of sometimes doing it and 
sometimes not (too confusing).
_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to