> Hal asked the same thing - and I'm confused because I thought > that if receiving an RMPP response times out, the entire transaction is > aborted.
RMPP still uses retries. If the user specifies a timeout of 1 second, with 3 retries, _each_ RMPP window will be retried up to 3 times, waiting for an ACK. Once an ACK is received, the next window can be retried up to 3 times, with a 1 second timeout per ACK, etc. It looks like your patch increments the timeout, and the increment is maintained across windows. > I'm open to this, but do we really need TCP/IP level congestion > control? How many nodes are likely to have more than a few SA > queries outstanding at a time? With large MPI job startup, we could have hundreds or thousands of SA queries issued from a single node. Even if the number of requests per node is small, the intent is to have all nodes back off from flooding the SA. So, I would say, yes, we want something like TCP congestion control. A delay in a response seems more likely to be a result in the SA being flooded with requests than an actual packet being dropped. This would also allow a node to delay sending any SA query after receiving a busy response to one. Caching data can help here, but we get the data from the SA first, plus still be able to handle errors, topology changes, QoS, etc. - Sean -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
