Krishnamoorthy, Sriram wrote:
Can someone please explain what can cause IBV_WC_RETRY_EXC_ERR? I am
using a combination of send-receive and RDMA. I have the reliable
connection queue pairs initialized as:
IBV_WC_RETRY_EXE_ERR means that there wasn't any ack by the receiver
after 4.096*(2 power 18) * 7 usec.
It can happen because of several reasons:
1) bad QP attributes
2) the remote side wasn't exists or it is in bad state
3) rare, but congestion in the network can causes this too
qp_attr.timeout = 18;
qp_attr.retry_cnt = 7;
qp_attr.rnr_retry = 7;
From the documentation, I assumed a value of 7 meant infinite retry.
Can lack of receive buffers cause this error? I understand
IBV_WC_RNR_RETRY_EXC_ERR to be the error caused by lack of receive
buffers. Could it be congestion in the network?
7 means infinite retry only for RNR flow, for retry flow 7 is the number
of time of the retransmission.
I could not find much from earlier queries related to this error. It
often occurs in the middle of the computation on large (>=1024)
processor counts, when I try to have multiple outstanding send-recvs
between a pair of processes (each pair of processes has a RC queue
pair initialized). I do not have a small test case yet that can repeat
this error.
How do you connect the both sides?
maybe the sender send messages to QP wasn't transfered to (at least) RTR
state?
Dotan
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general