yes.when the client rebooted and failed to connect the server,  the 
event.status is 
'IB_CM_REJ_STALE_CONN' . On the client ,when I created a QP using the old 
rdma_cm_id which  failed in the first connection to retry connection to server, 
the call to function rdma_create_qp() returned    -22(-EINVAL) all the time. 
So,I released all the resources ,and then created another rdma_cm_id returned 
by function rdma_create_id() and used it for another connection to the server. 
The connection was successful. But the sever showed two rdma_cm_id s used for 
connection.The fisrt one was which used before the client rebooted and and the 
second one was rdma_cm_id for the current connnection. Then, I rebooted the 
client again for several times and found that the number of rdma_cm_ids on the 
server equaled the times I rebooted the client. So,the server system contained 
many rdma_cm_ids which was not used again.
My question: Are there some methods that the server can be notifed when it lose 
the connection such as the client's reboot?So the server can destroy the 
rdma_cm_id and pass the stale connection check. The client needs no retry for 
new rdma_cm_id after reboot.

J.G Yang        

======= 2010-03-13 00:47:15=======

>>    server:  rdma_create_id,rdma_bind_addr, rdma_listen(cb->cm_id, 3);
>>Then  the  client connects the  server, the  connection is sucessful. Then
>>nothing is done on the server but  the client is rebooted . After the client
>>starts, it  connnets the server again. Errors come here. Sometime this
>>connnection can be successful, and fails othertime. When it fails,the client
>>receives RDMA_CM_EVENT_REJECTED event,and the server does't received
>>RDMA_CM_EVENT_CONNECT_REQUEST event as the successful connection.
>>I don't how this happens.Can someone help me ?Thanks!
>
>The server side may be rejecting the connection request as a duplicate.  The
>event.status may provide some additional insight.  It should contain one of the
>enum ib_cm_rej_reason values given in ib_cm.h (assuming that you're using IB 
>and
>not iWarp).  If the status is 'IB_CM_REJ_STALE_CONN' (= 10), then retrying the
>connection with a new QP may succeed.
>
>- Sean



Reply via email to