On 5/23/2013 1:31 PM, Alex Rosenbaum wrote:
On 5/21/2013 6:24 PM, Hefty, Sean wrote:
My first guess is that the server isn't responding to new requests. -
Sean
This is where we're looking now.
Now testing on 17 server with 8 clients per server.
When disabling all RDMA traffic in the test we get 100% RDMA
connection established. So at least we know this is not some
fundamental issue with our setup.
Modifying our code to increasing the priority of RDMA connection
handling to be higher then the RDMA traffic (CQ completions handling)
we still see many UNREACHABLE events. But only after quite a few
client got connected and started pushing traffic (1GB RDMA WRITEs from
server to client).
We are now adding code (via the conn_attr private data) to compare
timestamp between the rdma_conenct, RDMA_CM_EV_CONNECT_REQ,
rdma_accept and on the client events of UNREACHABLE or CONNECTED.
We'll have better understand once we see these results.
thanks,
Alex
We found the peace of code that got the server to hang for so long,
enough to causes the rdma_connect() to fail on the client side with
retries with RDMA_CM_EVENT_UNREACHABLE(-TIMEDOUT)
OK, case closed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html