I have a multi-threaded application. My application has its own message exchange protocol, it uses IB as the communication layer. I send a lot of messages which are normally of the order of few ten thousands. After sometime it seems like one message from one of the node is lost. I am using RC QP type. This causes the thread to deadlock. The other threads are still able to communicate exchanging messages without any problem over the same QP. Both ends are using SRQs and there is sufficient buffers posted so that I dont run out of buffers. I even tried doubling the buffers posted I see the same problem again. One message being lost. The ibv_post_send doesnt report any error. I am trying to get this done for a conference deadline early next week. I would really appreciate any help in suggesting any possibilities which might cause the message to be dropped without any error being returned.
Thanks, Bharath --- Bharath Ramesh <[EMAIL PROTECTED]> http://people.cs.vt.edu/~bramesh _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
