On 5/23/2013 1:31 PM, Alex Rosenbaum wrote:
On 5/21/2013 6:24 PM, Hefty, Sean wrote:
My first guess is that the server isn't responding to new requests. - Sean

This is where we're looking now.
Now testing on 17 server with 8 clients per server.

When disabling all RDMA traffic in the test we get 100% RDMA connection established. So at least we know this is not some fundamental issue with our setup.

Modifying our code to increasing the priority of RDMA connection handling to be higher then the RDMA traffic (CQ completions handling) we still see many UNREACHABLE events. But only after quite a few client got connected and started pushing traffic (1GB RDMA WRITEs from server to client).

We are now adding code (via the conn_attr private data) to compare timestamp between the rdma_conenct, RDMA_CM_EV_CONNECT_REQ, rdma_accept and on the client events of UNREACHABLE or CONNECTED.
We'll have better understand once we see these results.

thanks,

Alex
We found the peace of code that got the server to hang for so long, enough to causes the rdma_connect() to fail on the client side with retries with RDMA_CM_EVENT_UNREACHABLE(-TIMEDOUT)
OK, case closed.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to