added [email protected] to the thread, sorry for the double post.
On 3/5/07, Or Gerlitz <[EMAIL PROTECTED]> wrote:
On 3/2/07, Tang, Changqing <[EMAIL PROTECTED]> wrote: > What is the default size of the async event queue ? Suppose I > create 1024 QP from one process to another process, > Somehow the remote process crashes, Can I get all the 1024 QP error > async event, how do I make sure I don't loss an event ? CQ, I want to understand what is the exact fearure you need. for example, if TCP is used the equivalent of this is that following a remote process crash the remote node/s TCP stack close the TCP connections and when ever the local process attempts to use the socket it get an errno telling this connection was closed ?! Since you use RC QP, --if-- you attempt doing post_send (or rdma) to a QP whose connected peer QP is not responding, you will get CQ completion with "retry exceeded" error. If the above case (notification following post send) is not enough, the IB CM which you can use through libibcm or librdmacm provides the same functionality (sends DREQ if the process crashes) with the distinction that over TCP the same primitive (socket) is use for conn management and conn data xfer, where over IB, the QP is used for data and the IB CM Id (or the RDMA CM Id) is used for conn management. Combining possibilities: if you want to get a notification on every peer process crash, you would need to either poll/select once a while the libibcm/librdmacm event queue or implement some keep a live of your own protocol. For instance, I think the IB spec mentions doing zero length rdma write once in a while as a mean for implementing such protocol. Or.
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
