Hi Sumit, On Fri, 2008-06-27 at 19:28 +0530, Sumit Gaur - Sun Microsystem wrote: > hi Hal, > > Hal Rosenstock wrote: > > > >>>Can you elaborate on the multiple sends ? Are they outstanding > >>>concurrently ? Are they to the same destination or different ones ? Are > >>>they from a single or multiple threads ? > >> > >>No they are sending sequentially(mutex enabled) no concurrency but timeout > >>for > >>umad_recv is 100ms. > > > > > > Can you try increasing that to see if there is some threshold where it > > works more reliably ? Does it work better at say 200 msec (as you said > > your rate was 4/sec) ? The default timeout used in the diags is 1 sec. > > yes, I tried increasing it upto 3000ms but condition not improved much (it > *reduced* timeout failures but *no reduction* in recv_fail and this piling up > number of request per second too at client side).
Right; I'm not sure you can send the subsequent request until the former one either completes or times out. > Also I tried same code on two > separate subnet and similar problem come across. Is the second subnet any better/cleaner than the first ? > Could be a network issue but is > it a fact that GMP request takes much more time then SMP request (for ideal > subnet) ? Other than SMPs being VL15 with no flow control and GMPs being data VL (usually VL0) with flow control, there should be no difference for SMA v. PMA requests AFAIK. -- Hal > > Looks like there are some issues here to debug in your subnet. It might > > help to clear the counters and see what is actively going on to isolate > > these issues. This could factor into those other errors you are seeing. > > > > -- Hal > > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
