On Sun, May 11, 2008 at 11:18:19AM +0300, Eli Cohen wrote: > .... > The reason why the queue is stopped when there is one entry still left > is to allow ipoib_ib_tx_timer_func() to post a special send request that > will ensure a completion is reported for this operation thus freeing > entries at the tx ring. I don't think the scenario you describe here can > lead to a deadlock since if that happens, it will be released because of > either one of the following two reasons: > 1. If the tx queue contains not yet polled, more than one completion of > send WRs posted by ipoib_cm_send(), they will soon be polled since they > are posted to a signaled QP and sooner or later will generate > completions and interrupts. In this case, subsequent postings to > ipoib_send() will work as expected. > > 2. If there is only one outstanding ipoib_cm_send() at the tx queue, it > means that there are 126 outstanding ipoib_send() requests at the tx > queue and this means that a few of them are signaled and are expected to > be completed soon.
Thanks for the explanation. The main problem that we're seeing is that we just stop getting completions for the send queue. (And we see this with OFED-1.2 and 1.3, which makes me think that it's unlikely to be due to the IPoIB driver since that's changed so much.) > ..... > And last, could you arrange a remote access to a machine in this > condition so we could check the state of the device/FW? > Yes, I think so. Let me see if I can arrange that. -- Arthur _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
