Thank you Sean, I have an access to OpenSM configuration on the switch in my testing environment. I tried to reduce the timout there (subnet_timeout, packet_life_time), but unsuccessfully. Btw, I found detail that I cannot explain. I query the QP after connect and get timeout value 16, that must be 4 us * 2^ 16 = 256 ms, but I get about 800 ms.
Thanks, Vlad On Mon, Oct 31, 2011 at 8:51 AM, Hefty, Sean <[email protected]> wrote: >> I try to use ibv_poll_cq to identify connectivity problems. The >> scenario is following, based on modified rping example: >> >> 1) preliminary steps done and rdma connection established between >> Client and Server, retry_count in rdma_conn_param is set 1; >> 2) Server lost its link (corresponding switch port disabled), Client >> is still connected to the switch; >> 3) Client calls ibv_post_send >> 4) Client polls cq with ibv_poll_cq and gets expected >> IBV_WC_RETRY_EXC_ERR after about 1 second. >> >> Can this timeout be decreased? If it is impossible, can you suggest >> something else? > > I don't believe easily. The timeout is based on the path record returned by > the SM, which is really what an app should use. If you can adjust the > timeout at the SM, that would be best. > > If you can use a newer kernel, another alternative is to use rdma_set_option > to provide your own path record as input in place of calling > rdma_resolve_route. > > Btw, with a small timeout and few retries, if you're not using QoS, you may > want to enable that to prevent false timeouts. > > - Sean > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
