"Michael S. Tsirkin" <[EMAIL PROTECTED]> wrote on 04/23/2007 01:50:32 PM:
> > > > > > This would only help if there are short bursts of high-speed activity > > > on the receiving HCA: if the speed is different in the long run, > > > the right thing to do is to drop some packets and have TCP adjust > > > its window accordingly. > > > > > > But in that former case (short bursts), just increasing the number > > > of pre-posted > > > buffers on RQ should be enough, and looks like a much cleaner solution. > > > > This was not an issue with running out of buffers (which was my original > > suspicion too). This was probably due to missing ACKs -I am guessing > > this happens because the two HCAs have very different processing speeds. > > I don't see how different processing speeds could trigger missing ACKs. > Do you? Note: In the netperf tests errors were seen only when one side is ehca and the other side is mthca. When both sides are ehca or mthca no errors are seen. In the netperf tests I observed that ehca encountered lots of send completion errors. ehca encountered send completion errors whether it was the sender or the receiver (presumably sending Acks when it was the receiver). On the contrary mthca reported no errors -even when I changed /sys/module/ib_mthca/parameters/debug_level to 1 (that is the way to turn on debug on mthca -right?). With the Local CA Delay Ack set to 0 on ehca, I believe it is probably taking mthca more than 16us to deliver the Ack back to ehca. It might not be exactly 16 us, but I just assumed 4 times the Local CA Delay Ack (as per the spec) of 4us. That triggers the send completion error on ehca. On the other hand, when two ehca adapters use RC, no errors are encountered implying that the Ack is consistently delivered within 16us. Since mthca sets the Local CA Delay Ack value to 15, the timeouts between two mthcas are much larger (> 128 ms)and hence no problems are encountered. It is for that reason I stated that different processing speeds may be trigerring the missing Acks. > > > This is exacerbated by the fact that retry count (not RNR retry count)was 0. > > When I changed the retry count to a small values like 3 it still works. > > Please see below for additional details. > > Looks like work-around for some breakage elsewhere. > Maybe it's a good thing we don't retry in such cases - retries are not good > for network performance, and this way we move the problem to it's > root cause where it can be debugged and fixed instead of overloading > the network. There is no single value all HCAs can pick and provide optimal performance in all situations. The only way would be to select a certain value that is optimal for each HCA, and depend on a retry mechanism when the selected value does not meet the needs of interoperability. To depend on higher levels like TCP or even the application to do the retries will kill performance. > > > > > Can someone point me to where this comment is in the RFC? I > would like to > > > > understand the reasoning. > > > > > > See "7.1 A Cautionary Note on IPoIB-RC". > > > See also classics such as http://sites.inka.de/~W1011/devel/tcp-tcp.html > > > > > > If we do this right, the above mentioned problems should not > occur. In the case > > we are dealing with the RC timers are expected to be much smaller (than TCP > > timers) and > > should not interfere with TCP timers. The IBM HCA uses a default > value of 0 for > > the Local CA Ack Delay; > > which is probably too small a value and with a retry > > count of 0, ACKs are missed. I agree with Roland's assessment (this was in a > > seperate thread), that this should not be 0. > > So, it's an ehca bug then? > I didn't really get the explanation. Who loses the ACKs? ehca? > It is the case that ehca *reports* Local CA Ack Delay that is > *below* what it actually provides? If so, it should be easy to fix in driver. Yes, there is a problem with the IBM HCA, and we will address this. I stated as much, when I concurred with Roland's assessment. > > > On the other hand with the Topspin adapter (and mthca) that I have the > > Local CA Ack Delay is 0xf which would imply a Local Ack Timeout of > 4.096us * 2^15 which > > is about 128ms. The IB spec says it can be upto 4 times this value > which means upto > > 512 ms. > > > > The smallest TCP retransmission timer is HZ/5 which is 200 ms on several > > architectures. > > Yes, even with a retry count of 1 or 2, there is then a risk of > > interfering with TCP timers. > > > > If my understanding is correct, the way its should be done is to > have a small > > value for the Local CA Ack Delay like say 3 or 4 which would implya Timeout > > value of 32-64us, with a small retry count of 2 or 3. This way the > max Timeout > > would be still be only several hundreds of us, a factor of 1000 > less than the > > minimum TCP timeout. IB adapters are supposed to have a much smaller latency > > than ethernet adapters, so I am guessing that this would be in the > ballpark for > > most HCAs. > > > > Unfortunately I do not know how much of an effort it will take to change the > > Local CA Delay Ack across the various HCAs (if need be). > > How about fixing ehca not to trigger ACK loss instead? As previously stated, IBM HCA will address these issues. However, my understanding is that mthca/Topspin adapters also have a problem (too high a value for the Local CA Delay Ack). Both HCAs need to be fixed for good interoperability. > > > In the interim, the > > only parameter we can control is the retry count and we could make > this a module > > parameter. > > Since both 0 and > 0 values might lead to problems, this does not > look like a real solution. > Please see previous reasoning as to why we need a retry mecahnism. > > > > > > By the way, as long as you are not using SRQ, why not use UC mode QPs? > > > This would look like a cleaner solution. > > You haven't addressed this, and this might be a better way out. > Unreliable SRQ > being only supported for RC QPs now is really one of the major > reasons IPoIB CM > uses RC rather than UC. > This is a good point you make. However, this will not address the core issue of missing Acks -the difference in processing speeds. What happens when the next version of IBM HCA (or for that matter HCA from any other vendor) supporting SRQ comes out? Pradeep [EMAIL PROTECTED] _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
