* Dotan Barak ([EMAIL PROTECTED]) wrote: > Hi. > > Bharath Ramesh wrote: >> I have a multi-threaded application. My application has its own message >> exchange protocol, it uses IB as the communication layer. I send a lot >> of messages which are normally of the order of few ten thousands. After >> sometime it seems like one message from one of the node is lost. I am >> using RC QP type. This causes the thread to deadlock. The other threads >> are still able to communicate exchanging messages without any problem >> over the same QP. Both ends are using SRQs and there is sufficient >> buffers posted so that I dont run out of buffers. I even tried doubling >> the buffers posted I see the same problem again. One message being lost. >> The ibv_post_send doesnt report any error. I am trying to get this done >> for a conference deadline early next week. I would really appreciate any >> help in suggesting any possibilities which might cause the message to be >> dropped without any error being returned. >> > If you don't have any bugs in your code, the described scenario should > work. > > I need some more info in order to try to help you: > > Do you use the same QP from several threads (and post send from all of > them)?
Yes, I use the same the QP from three threads. The application has close to 5 threads. The receives are handled by a single thread. Most of the sends are posted by a single thread. Occasionally a third thread posts a few sends to the QP. The same QP is also used for RDMA Writes. Majority of the RDMA Writes are also performed by the same thread that posts majority of the send messages. > How do you poll the CQ (several threads/one)? I have two CQs, one for receive and the other for send. The receive CQ is polled only by the receive thread. The send CQ is polled by the three threads. Occasionally by the receiver thread to clear out an send CQEs because I use IBV_SEND_SIGNALED for every 16 IBV_SEND_INLINEs. Otherwise the send CQ is polled by the single thread that does majority of the sends. Occasionally the third thread when doing a send might poll the send CQ as well for completion CQE in case of a RDMA Write. > > which HW/SW do you use? I am using Yellow Dog Linux 5.0 on Apple Xserves. Thanks, Bharath --- Bharath Ramesh <[EMAIL PROTECTED]> http://people.cs.vt.edu/~bramesh _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
