On Tue, Sep 09, 2008 at 02:32:44PM -0700, Roland Dreier wrote: > By the way, looking at this stuff again, it seems we have (a possibly > quite unlikely) race where a send can complete before the xmit method > finishes, and we end up running skb_orphan on an skb that another > context has already freed. I'll have to think about how we can fix > that -- but any good ideas are appreciated...
Actually it looks like Arthur's patch introduces this race. The current code is OK because skb_orphan is called under tx_lock, which is also held when we poll the send CQ. But of course the status quo is no good exactly because of the locking issue Arthur found. > We can check if there are outstanding WRs after poll_tx is called. If > there are no outstanding WRs, it means that the SKB has been freed. If > there are outstanding WRs, it means that the last post has not been > freed so we can call skb_orphan(). Like the following patch (on top of > Arthur's): I don't think this closes the race completely: at the point skb_orphan is called (after Arthur's patch, by design), we have no locks held. And so the timer-driven send completion handling could already have run and freed the skb between when we drop tx_lock and when we call skb_orphan. - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
