Sorry for late reply.

2010/6/12 Dotan Barak <[email protected]>:
> On 12/06/2010 03:22, Ding Dinghua wrote:
>>
>> 2010/6/11 Dotan Barak<[email protected]>:
>>
>>>
>>> Hi.
>>>
>>> On 11/06/2010 10:51, Ding Dinghua wrote:
>>>
>>>>
>>>> Hi all:
>>>>          I'm using RDMA to do fs-metadata mirror between nodes. I
>>>> encountered a strange problem when the program was running:
>>>> Complete queue handler reported that the  RDMA-Write operation failed,
>>>>  the status of  corresponding "struct ib_wc" is "IB_WC_RETRY_EXC_ERR".
>>>> The problem is encountered randomly. I don't know the meaning of this
>>>> error code as well as what to do next. Would anyone give me some tips?
>>>> thanks a lot.
>>>>
>>>>
>>>
>>> Do you sync between the sides before closing the QPs?
>>>
>>
>> Can you say it more detail? thanks.
>>
>
> If you try to send a message from local QP to a remote QP before the remote
> QP is in RTR state (or after it was closed/transferred to the ERROR state),
> you may get RETRY EXCEEDED, because there isn't any QP in the remote side
> that can accept your message (and send a response).
>
> How do you connect the QPs? (And how do you close the connection between
> them)
>
I call rdma_create_id to create an ib id, then do resolve remote addr,
resolve route work, then
setup qp and call rdma_connect to setup connection, before ack or
error replies, the thread will
wait on a wait queue. The listening ib id of remote node will catch
the connect request,
setup qp, allocate and map pages to construct the RDMA-WRITE space,
and call rdma_accept to reply
the request.

Some other information which may be useful:
1.All the "RETRY EXCEEDED" problems happened when there were two
connections which use RDMA-WRITE to transfer things.
And the latter connection had a high possibility to get into this problem.
2. All the "RETRY EXCEEDED" problems happened when the RMDA-WRITE
space is 256MB each(that is, for two connections, consumes 512MB mem),
when the RDMA-WRITE  space is 64MB, this problem never happened in our
test. Remote node's total memory is 2GB.

Thanks a lot.


> Dotan
>



-- 
Ding Dinghua
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to