On 20/06/2010 07:51, Ding Dinghua wrote:
hello,
2010/6/19 Dotan Barak<[email protected]>:
I call rdma_create_id to create an ib id, then do resolve remote addr,
resolve route work, then
setup qp and call rdma_connect to setup connection, before ack or
error replies, the thread will
wait on a wait queue. The listening ib id of remote node will catch
the connect request,
setup qp, allocate and map pages to construct the RDMA-WRITE space,
and call rdma_accept to reply
the request.
Some other information which may be useful:
1.All the "RETRY EXCEEDED" problems happened when there were two
connections which use RDMA-WRITE to transfer things.
And the latter connection had a high possibility to get into this problem.
2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE
space is 256MB each(that is, for two connections, consumes 512MB mem),
when the RDMA-WRITE space is 64MB, this problem never happened in our
test. Remote node's total memory is 2GB.
Thanks a lot.
Some more questions:
* Is the WR that "produces" the RETRY EXCEEDED is the first one/last one/in
the middle?
it's the first one
* Which values are you using in the QP context for retry exceeded counter +
retry timeout?
* Did you try to increase those values?
I haven't set these values(actually I don't know where to set these
values), i just set max_send_wr and max_send_sge
fields of struct ib_qp_cap when creating qp.
Can you perform query QP after establishing a connection between the QPs
and check those values?
* How many more QPs do you have between those nodes and which operations do
they use
(only RDMA-WRITEs?)
4096 QPs for each connection, only do RDMA-WRITES.
So, you send in parallel total of 4K (QPs) * 64M (Bytes) = 256 GB
(am i missing something, or this is the amount of data that will be sent
between two nodes?)
Dotan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html