On Tue, 2008-04-29 at 14:49 -0700, Roland Dreier wrote: > By the way, this isn't just theoretical -- I'm not smart enough to > realize this except that I just saw: > > ib1: TX ring full, stopping kernel net queue > NETDEV WATCHDOG: ib1: transmit timed out > ib1: transmit timeout: latency 1240 msecs > ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255 > > and of course it never recovers.
I started working on a fix for this by arming the send CQ when the QP reaches 63 outstanding requests and draining the CQ at the completion handler while holding priv->tx_lock. But I had another strange problem that I don't understand. If I just load and unload ib_ipoib, the system crashes showing messages that appear like there has been a memory corruption. If I comment out destroying the send CQ at ipoib_transport_dev_cleanup() the crashes disappear. Do you see this as well? _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
