> RDMA_CM_EVENT_UNREACHABLE is indicated when there are timeouts in
> underlying CM protocol exchange. I suspect that the server is really
> busy and doesn't respond to the low level CM MADs in a timely manner.
> RDMA CM (and other kernel ULPs like IPoIB and SRP use hard coded local
> and remote response timeouts of 20 which is ~4.3 sec. This was discussed
> back in 2006 in
> http://comments.gmane.org/gmane.linux.drivers.openib/27664. In this
> scenario, the response took more than 30 seconds.  More recently, there
> was proposal to base RDMA CM response timeout on subnet timeout
> (http://permalink.gmane.org/gmane.linux.drivers.rdma/19969).

Hal's assessment seems likely.  Error code -110 is ETIMEDOUT.  However, the IB 
CM timeout when used through the RDMA CM should be much larger, as it makes use 
of the CM MRA protocol.  Unless a lot of MADs are being lost, or I'm not 
remembering the RDMA CM code correctly, there's still an issue here that I'm 
not understanding.

- Sean

Reply via email to