Krishnamoorthy, Sriram wrote:
IBV_WC_RETRY_EXE_ERR means that there wasn't any ack by the receiver
after 4.096*(2 >
power 18) * 7 usec.
Does an ack from the receiver require the process/thread to be awake? I
have been trying to get a small test case, and sleeping without posting
enough recv-s seems to occasionally result in IBV_WC_RETRY_EXC_ERR
(instead of IBV_WC_RNR_RETRY_EXC_ERR which occurs a lot more often, and
of course with much smaller timeout, retry_count, and rnr_retry_count).
No, the ack is being handled in the HCA level (unless, the transport of
the IB is being implemented in SW...)
If putting sleep in your code causes IBV_WC_RETRY_EXC_ERR i would
suspect SW bugs ...
It can happen because of several reasons:
1) bad QP attributes
2) the remote side wasn't exists or it is in bad state
3) rare, but congestion in the network can causes this too
7 means infinite retry only for RNR flow, for retry flow 7 is the
number of time of the
retransmission.
How do you connect the both sides?
maybe the sender send messages to QP wasn't transfered to (at least)
RTR state?
All queue pairs are transitioned into RTS state before any
communication. All queue pairs are transitioned to RTR state, then there
is an MPI barrier (which could be using its own queue pairs or sockets),
and then all queue pairs are transitioned into RTS state.
Good, i was afraid from any race when one side start to send messages
and the other side wasn't in RTR state.
All error messages out of verbs API are checked. Is it possible for a
queue pair to transition into an error state and it is identified first
as an IBV_WC_RETRY_EXC_ERR and not as a local error?
Theoretically: no.
Is this is the first message that is being passed in those QPs?
Can you check the QP state of the remote side when you get such an error?
Thanks,
Sriram.K
You are welcome
Dotan
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general