> Hal asked the same thing - and I'm confused because I thought
> that if receiving an RMPP response times out, the entire transaction is
> aborted.

RMPP still uses retries.  If the user specifies a timeout of 1 second, with 3 
retries, _each_ RMPP window will be retried up to 3 times, waiting for an ACK.  
Once an ACK is received, the next window can be retried up to 3 times, with a 1 
second timeout per ACK, etc.  It looks like your patch increments the timeout, 
and the increment is maintained across windows.

> I'm open to this, but do we really need TCP/IP level congestion
> control? How many nodes are likely to have more than a few SA
> queries outstanding at a time?

With large MPI job startup, we could have hundreds or thousands of SA queries 
issued from a single node.  Even if the number of requests per node is small, 
the intent is to have all nodes back off from flooding the SA.  So, I would 
say, yes, we want something like TCP congestion control.  A delay in a response 
seems more likely to be a result in the SA being flooded with requests than an 
actual packet being dropped.

This would also allow a node to delay sending any SA query after receiving a 
busy response to one.  Caching data can help here, but we get the data from the 
SA first, plus still be able to handle errors, topology changes, QoS, etc.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to