Can you call rdma_disconnect() immediately after posting sends on the QP? I don't see any CQE's come back with errors but they appear to "disappear" and never get signaled on one peer side. Are there any potential race issues to avoid here (it only happens about one out of every 100 connections)?
rdma_disconnect() will immediately transition the QP into the error state, which can affect queued send operations.
I think the situation that you're describing could happen if the side that received the send transitioned the QP into the error state, but the ACK sent back to the sender was lost. Can anyone confirm this?
You could try having one side initiate the rdma_disconnect, with the other side waiting for the disconnect event before calling rdma_disconnect itself.
- Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
