Artur, when it happens please: 1. Check the link error counters. 2. Disconnect and reconnect the cable and see if it recovers.
On 4/30/08, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > At a customer site running OFED 1.2 we are seeing the > following - after ~10s of hours of stressing IPoIB, > the card apparently stops generating TX completions. > (These are MT25204 cards in x86_64 boxes, and we've seen > this with a couple f/w versions, including the latest.) > > We get something like: > > kernel: NETDEV WATCHDOG: ib0: transmit timed out > kernel: ib0: transmit timeout: latency 1972 msecs > kernel: ib0: queue stopped 1, tx_head 3271, tx_tail 3207 > > and that repeats "forever". > > And to simplify things, we can produce this behavior in > datagram mode. > > As long as only datagram mode is in use, the TX code in the > IPoIB driver seems quite straightforward. The only reason I > can imagine that we'd fail to get a timely TX completion > would be if link-level flow control were to throttle us. And > I'd expect that to be a transient condition... Am I > ovelooking something? Anyone seen similar? Suggestions for > debugging? > > -- > Arthur > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
