yes.when the client rebooted and failed to connect the server, the event.status is 'IB_CM_REJ_STALE_CONN' . On the client ,when I created a QP using the old rdma_cm_id which failed in the first connection to retry connection to server, the call to function rdma_create_qp() returned -22(-EINVAL) all the time. So,I released all the resources ,and then created another rdma_cm_id returned by function rdma_create_id() and used it for another connection to the server. The connection was successful. But the sever showed two rdma_cm_id s used for connection.The fisrt one was which used before the client rebooted and and the second one was rdma_cm_id for the current connnection. Then, I rebooted the client again for several times and found that the number of rdma_cm_ids on the server equaled the times I rebooted the client. So,the server system contained many rdma_cm_ids which was not used again. My question: Are there some methods that the server can be notifed when it lose the connection such as the client's reboot?So the server can destroy the rdma_cm_id and pass the stale connection check. The client needs no retry for new rdma_cm_id after reboot.
J.G Yang ======= 2010-03-13 00:47:15======= >> server: rdma_create_id,rdma_bind_addr, rdma_listen(cb->cm_id, 3); >>Then the client connects the server, the connection is sucessful. Then >>nothing is done on the server but the client is rebooted . After the client >>starts, it connnets the server again. Errors come here. Sometime this >>connnection can be successful, and fails othertime. When it fails,the client >>receives RDMA_CM_EVENT_REJECTED event,and the server does't received >>RDMA_CM_EVENT_CONNECT_REQUEST event as the successful connection. >>I don't how this happens.Can someone help me ?Thanks! > >The server side may be rejecting the connection request as a duplicate. The >event.status may provide some additional insight. It should contain one of the >enum ib_cm_rej_reason values given in ib_cm.h (assuming that you're using IB >and >not iWarp). If the status is 'IB_CM_REJ_STALE_CONN' (= 10), then retrying the >connection with a new QP may succeed. > >- Sean
