Sean said: > When you consider RMPP, the timeout/retry values specified by > the user are not straightforward in their meaning. I haven't > look at this patch in detail yet, but how do the timeout > changes work with RMPP MADs? Is the timeout reset to the > minimum after an ACK is received?
Hal asked the same thing - and I'm confused because I thought that if receiving an RMPP response times out, the entire transaction is aborted. First, the existing code - before I patched it - doesn't distinguish between RMPP and regular MADs when dealing with timeouts. Second, the spec says (on p 788): | If the Receiver does not receive all the packets in this transaction within | its transaction timer, it ABORTs the transaction and terminates. As far as I can tell, that's what the current ib_mad module implements - if the entire transaction doesn't complete with the receiver-specified time out, the entire thing is retried. > My personal preference at this time is to push more intelligence > into the timeout/retry algorithm used by the MAD layer, but > restricted to SA clients. I'd like to see even more randomization > in the retry time, coupled with a TCP-like congestion windowing > implementation when issuing SA queries. > For example: Never allow more than, say, 8 SA queries outstanding > at a time. If an SA query times out, reduce the number of > outstanding queries to 1 until we get a response, then double the > number of queries allowed to be outstanding until we reach the max. > Have the mad layer calculate the SA query timeout based on the > actual SA response time, with randomization based on that. The > user specified timeout value can basically be ignored. >The only reason I'm suggesting we restrict the algorithm to SA > queries is to avoid storing per endpoint information. That may > be better handled by the CM (since CM responses are sends). > Given all this, then I think it would be okay to accept the > patch to drop busy responses from the SA until this framework > is in place, which wouldn't be until 2.6.38 or 39. I'm open to this, but do we really need TCP/IP level congestion control? How many nodes are likely to have more than a few SA queries outstanding at a time? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
