I have seen sporadic errors while running the HCAs in connected mode.
These errors appear to be related to the speeds of the different HCAs.
Increasing the retry counts solves the problem.
I looked at the RFC as regards to warnings about retries. The warnings
is to make sure that the IB timeouts do not interfere with TCP timeouts.
The TCP timeout are so much larger than the IB timeouts (even with
non zero values) that we are nowhere close to interfering with TCP
timeouts.
Signed-off-by: Pradeep Satyanarayana <[EMAIL PROTECTED]>
---
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-12-21 16:06:49.000000000
-0500
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c 2007-12-21 16:07:28.000000000
-0500
@@ -990,8 +990,8 @@ static int ipoib_cm_send_req(struct net_
req.responder_resources = 4;
req.remote_cm_response_timeout = 20;
req.local_cm_response_timeout = 20;
- req.retry_count = 0; /* RFC draft warns against retries
*/
- req.rnr_retry_count = 0; /* RFC draft warns against retries
*/
+ req.retry_count = 3;
+ req.rnr_retry_count = 3;
req.max_cm_retries = 15;
req.srq = ipoib_cm_has_srq(dev);
return ib_send_cm_req(id, &req);
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general