Or:
        Thank you for the description. I have read the spec carefully
and got some idea. But here is a case I don't know.

        I have 1024 QPs on a single port/cable. There is NO receive
posted because I use pure RDMA write. And also there is no pending send.
At this point I pull the cable out.

        I will get the port error event(right ?). Do I also get 1024 QP
error events ? Because there is no way to report through completion
status. Or the QPs are still in good state even though I pull out cable
?


--CQ

 

> -----Original Message-----
> From: Or Gerlitz [mailto:[EMAIL PROTECTED] 
> Sent: Monday, March 05, 2007 3:37 AM
> To: Tang, Changqing
> Cc: Roland Dreier; [email protected]
> Subject: Re: [ofa-general] What is the size of async event queue ?
> 
> On 3/2/07, Tang, Changqing <[EMAIL PROTECTED]> wrote:
> 
> >         What is the default size of the async event queue ? 
>  Suppose I 
> > create 1024 QP from one process to another process, Somehow 
> the remote 
> > process crashes, Can I get all the 1024 QP error async 
> event, how do I 
> > make sure I don't loss an event ?
> 
> CQ,
> 
> I want to understand what is the exact fearure you need.
> 
> for example, if TCP is used the equivalent of this is that 
> following a remote process crash the remote node/s TCP stack 
> close the TCP connections and when ever the local process 
> attempts to use the socket it get an errno telling this 
> connection was closed ?!
> 
> Since you use RC QP, --if-- you attempt doing post_send (or 
> rdma) to a QP whose connected peer QP is not responding, you 
> will get CQ completion with "retry exceeded" error.
> 
> If the above case (notification following post send) is not 
> enough, the IB CM which you can use through libibcm or 
> librdmacm provides the same functionality (sends DREQ if the 
> process crashes) with the distinction that over TCP the same 
> primitive (socket) is use for conn management and conn data 
> xfer, where over IB, the QP is used for data and the IB CM Id 
> (or the RDMA CM Id) is used for conn management.
> 
> Combining possibilities: if you want to get a notification on 
> every peer process crash, you would need to either 
> poll/select once a while the libibcm/librdmacm event queue or 
> implement some keep a live of your own protocol. For 
> instance, I think the IB spec mentions doing zero length rdma 
> write once in a while as a mean for implementing such protocol.
> 
> Or.
> 

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to