>IBV_WC_RETRY_EXE_ERR means that there wasn't any ack by the receiver after 4.096*(2 > >power 18) * 7 usec. Does an ack from the receiver require the process/thread to be awake? I have been trying to get a small test case, and sleeping without posting enough recv-s seems to occasionally result in IBV_WC_RETRY_EXC_ERR (instead of IBV_WC_RNR_RETRY_EXC_ERR which occurs a lot more often, and of course with much smaller timeout, retry_count, and rnr_retry_count). >It can happen because of several reasons: >1) bad QP attributes >2) the remote side wasn't exists or it is in bad state >3) rare, but congestion in the network can causes this too
>7 means infinite retry only for RNR flow, for retry flow 7 is the number of time of the >retransmission. >How do you connect the both sides? >maybe the sender send messages to QP wasn't transfered to (at least) RTR state? All queue pairs are transitioned into RTS state before any communication. All queue pairs are transitioned to RTR state, then there is an MPI barrier (which could be using its own queue pairs or sockets), and then all queue pairs are transitioned into RTS state. All error messages out of verbs API are checked. Is it possible for a queue pair to transition into an error state and it is identified first as an IBV_WC_RETRY_EXC_ERR and not as a local error? Thanks, Sriram.K _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
