On Thu, 2010-02-25 at 12:15 -0800, Roland Dreier wrote: > > When using connected mode, ipoib_cm_create_tx() kmallocs a > > struct ipoib_cm_tx which contains pointers to ipoib_neigh and > > ipoib_path. If the paths are flushed or the struct neighbour is > > destroyed, the pointers held by struct ipoib_cm_tx can reference > > freed memory. The fix is to add reference counts to struct > > ipoib_neigh and ipoib_path and to add locking when getting > > new references. > > Good debugging. > > First look at this patch is that it ends up being rather invasive. I > wonder if we could fix this in the other direction by keeping a list of > the ipoib_cm_tx structures affected in the neigh and path structures, > and clean the cm_tx stuff up when flushing? > > Also I don't see any issues from a first read, but can you confirm that > you're not adding more locking/atomic ops (via kref) to the main data path? > > - R.
I agree it is invasive. I thought it would be easier to discuss an actual patch than me trying to hand wave about a solution. Plus, now that I understand the problems better, I'm thinking of new ways to fix them. There is most definitely a new lock/unlock in the normal send path because ipoib_start_xmit() now calls neighbour_priv() which acquires the priv->lock() and does a kref_get(). I'm not really sure what things can change while ipoib_start_xmit() is active so I was being cautious. I guess at a minimum, ipoib_neigh_cleanup() won't be called by the network stack while ipoib_start_xmit() is active so the to_ipoib_neigh(neighbour) should be valid without my added locking. We could avoid adding a kref_t to struct ipoib_path by replacing the pointer to ipoib_path in struct ipoib_cm_tx with a struct ib_sa_path_rec. Otherwise, I think ipoib_flush_paths() could call into ipoib_cm.c to make sure no ipoib_cm_tx is queued on the priv->cm.start_list which points to the given struct ipoib_path (and remove it from the list if found). I will try these ideas out and send an updated patch based on the results. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
