Or Gerlitz wrote: > Pradeep Satyanarayana wrote: >> I brought this issue up on the mailing list sometime in the summer of >> 2007 is >> my recollection. I could not locate that with a quick search of the >> archives. >> I will probably do that again later. > > Its from December 2007 > http://lists.openfabrics.org/pipermail/general/2007-December/044299.html > >> However, the crux of the issue is that I was seeing "send completion >> errors" and >> that is what prompted me to change the retry counts. Please see Table >> 78 "Completion Error Handling for RC Send Queues" in the IB Spec for >> reference. >> And changing the retry counts did help. > > I understand that changing the retry counts eliminated the issue you > were seeing in your setup, however, its more of an observation than an > actual problem statement whose solution can be judged. Apart from that, > I have concerns regarding the approach of adding retries to layer that > provides unreliable service, see my comments on the other emails, and > feel free to respond there.
Hello Or, Thanks for the pointer to the December mailing list. I have actually brought up this issue much before that time. Here is the link: http://lists.openfabrics.org/pipermail/general/2007-April/035308.html I was seeing "send completion errors" which means the QP was torn down and being recreated all the time. It was on account of this that I changed the retry counts, not the other way round. In this case the TCP timers are so large (hundreds of ms) compared to micro-seconds for Infiniband, that before TCP takes action to recover from errors, the QP is torn down (and recreated). As you can guess, the performance tanks. I am not clear why you think that this was an observation rather than an actual problem. Pradeep _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
