> > Can you explain this change a little more? It seems quite likely that > > we would get last WQE reached events for other states, such as > > IPOIB_CM_RX_ERROR coming from ipoib_cm_dev_stop(), and I don't see how > > things work if we make this change. > > > > - R. > > Hello Roland, > > If it's already in ERROR status, it will be processed through > rx_error_list. In the case of ipoib_cm_dev_stop(), it will wait for 5 * HZ > to be drained and then put into reap_list. In the case of IPoIB running > status, I put a 60 * HZ timer for drain in the stale connection release > patch.
But the 5 second timeout in ipoib_cm_dev_stop() is supposed to be an exception when something gets wedged, just to avoid waiting forever. We want to handle the last WQE reached events normally in most cases. Would a better fix to add locking around the "assume HW is wedged" code in ipoib_cm_dev_stop() to avoid problems if the 5 second timeout is too short? - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
