I have an application that has just posted a few sends on a connected RC queue pair, when either the application itself modifies the QP state to error, or the remote side goes into error. The *first* of these posted send WQE's generates a CQE indicating IBV_WC_WR_FLUSH_ERR [or something like IBV_WC_REM_OP_ERR, in the remote case] as I would expect. But the remaining pending WQE's never seem to generate CQE's. [The ibv_post_send operation did not give local errors on these, BTW.] As a result, my app. hangs waiting for the pending operations to drain.
IB architecture spec. sections 9.9.2.3 and 9.9.2.4 seem to suggest that all pending WQE's behind the failed request (error class B, I think) should generate CQE's with the FLUSH error. Questions: (1) Do I understand the spec correctly? Should WQE's posted subsequently to the one that is going to fail be generating FLUSH errors? (2) Has anyone seen this behavior before? Is it common? [I haven't tried switching hardware -- card I'm using *may* not be production level.] If it *is* common behavior, I may need to recode my app. to mark all outstanding requests as failed upon receiving the first error, and then ignore any subsequent errors, to be defensive about it -- this seems kludgy, though, and I'd rather not do that if I don't have to. Thanks, Scott _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
