Thank you Sean,

I have an access to OpenSM configuration on the switch in my testing
environment. I tried to reduce the timout there (subnet_timeout,
packet_life_time), but unsuccessfully.
Btw, I found detail that I cannot explain. I query the QP after
connect and get timeout value 16, that must be 4 us * 2^ 16 = 256 ms,
but I get about 800 ms.

Thanks,
Vlad

On Mon, Oct 31, 2011 at 8:51 AM, Hefty, Sean <[email protected]> wrote:
>> I try to use ibv_poll_cq to identify connectivity problems. The
>> scenario is following, based on modified rping example:
>>
>> 1) preliminary steps done and rdma connection established between
>> Client and Server, retry_count in rdma_conn_param is set 1;
>> 2) Server lost its link (corresponding switch port disabled), Client
>> is still connected to the switch;
>> 3) Client calls ibv_post_send
>> 4) Client polls cq with ibv_poll_cq and gets expected
>> IBV_WC_RETRY_EXC_ERR after about 1 second.
>>
>> Can this timeout be decreased? If it is impossible, can you suggest
>> something else?
>
> I don't believe easily.  The timeout is based on the path record returned by 
> the SM, which is really what an app should use.  If you can adjust the 
> timeout at the SM, that would be best.
>
> If you can use a newer kernel, another alternative is to use rdma_set_option 
> to provide your own path record as input in place of calling 
> rdma_resolve_route.
>
> Btw, with a small timeout and few retries, if you're not using QoS, you may 
> want to enable that to prevent false timeouts.
>
> - Sean
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to