> > > This patch has been tested with linux-2.6.21-rc5 and rc7 with Topspin and > > > IBM HCAs on ppc64 machines. I have run > > > netperf between two IBM HCAs and two Topspin HCAs, as well as between IBM > > > and Topspin HCA. > > > > > > Note 1: There was interesting discovery that I made when I ran netperf > > > > between Topsin and IBM HCA. I started to see > > > the IB_WC_RETRY_EXC_ERR error upon send completion. This may have been > > > due > > > to the differences in the > > > processing speeds of the two HCA. This was rectified by seting the > > > retry_count to a non-zero value in ipoib_cm_send_req(). > > > I had to do this inspite of the comment --> /* RFC draft warns against > > > retries */ > > > > This would only help if there are short bursts of high-speed activity > > on the receiving HCA: if the speed is different in the long run, > > the right thing to do is to drop some packets and have TCP adjust > > its window accordingly. > > > > But in that former case (short bursts), just increasing the number > > of pre-posted > > buffers on RQ should be enough, and looks like a much cleaner solution. > > This was not an issue with running out of buffers (which was my original > suspicion too). This was probably due to missing ACKs -I am guessing > this happens because the two HCAs have very different processing speeds.
I don't see how different processing speeds could trigger missing ACKs. Do you? > This is exacerbated by the fact that retry count (not RNR retry count)was 0. > When I changed the retry count to a small values like 3 it still works. > Please see below for additional details. Looks like work-around for some breakage elsewhere. Maybe it's a good thing we don't retry in such cases - retries are not good for network performance, and this way we move the problem to it's root cause where it can be debugged and fixed instead of overloading the network. > > > Can someone point me to where this comment is in the RFC? I would like to > > > understand the reasoning. > > > > See "7.1 A Cautionary Note on IPoIB-RC". > > See also classics such as http://sites.inka.de/~W1011/devel/tcp-tcp.html > > > If we do this right, the above mentioned problems should not occur. In the > case > we are dealing with the RC timers are expected to be much smaller (than TCP > timers) and > should not interfere with TCP timers. The IBM HCA uses a default value of 0 > for > the Local CA Ack Delay; > which is probably too small a value and with a retry > count of 0, ACKs are missed. I agree with Roland's assessment (this was in a > seperate thread), that this should not be 0. So, it's an ehca bug then? I didn't really get the explanation. Who loses the ACKs? ehca? It is the case that ehca *reports* Local CA Ack Delay that is *below* what it actually provides? If so, it should be easy to fix in driver. > On the other hand with the Topspin adapter (and mthca) that I have the > Local CA Ack Delay is 0xf which would imply a Local Ack Timeout of 4.096us * > 2^15 which > is about 128ms. The IB spec says it can be upto 4 times this value which > means upto > 512 ms. > > The smallest TCP retransmission timer is HZ/5 which is 200 ms on several > architectures. > Yes, even with a retry count of 1 or 2, there is then a risk of > interfering with TCP timers. > > If my understanding is correct, the way its should be done is to have a small > value for the Local CA Ack Delay like say 3 or 4 which would imply a Timeout > value of 32-64us, with a small retry count of 2 or 3. This way the max Timeout > would be still be only several hundreds of us, a factor of 1000 less than the > minimum TCP timeout. IB adapters are supposed to have a much smaller latency > than ethernet adapters, so I am guessing that this would be in the ballpark > for > most HCAs. > > Unfortunately I do not know how much of an effort it will take to change the > Local CA Delay Ack across the various HCAs (if need be). How about fixing ehca not to trigger ACK loss instead? > In the interim, the > only parameter we can control is the retry count and we could make this a > module > parameter. Since both 0 and > 0 values might lead to problems, this does not look like a real solution. > > > > By the way, as long as you are not using SRQ, why not use UC mode QPs? > > This would look like a cleaner solution. You haven't addressed this, and this might be a better way out. Unreliable SRQ being only supported for RC QPs now is really one of the major reasons IPoIB CM uses RC rather than UC. -- MST _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
