Hi Sumit, On Wed, 2008-06-25 at 20:21 +0530, Sumit Gaur - Sun Microsystem wrote: > Hi Hal, > > Hal Rosenstock wrote: > > Hi Sumit, > > > > On Wed, 2008-06-25 at 12:11 +0530, Sumit Gaur - Sun Microsystem wrote: > > > >>Hi Hal/Sashak, > >> > >>Hal Rosenstock wrote: > >> > >>>Hi Sumit, > >>> > >>>On Tue, 2008-06-24 at 14:21 +0530, Sumit Gaur - Sun Microsystem wrote: > >>> > >>> > >>>>Hi, > >>>>I am using OFED 2.5.* > >>> > >>> ^^^^^ > >>> 1.2.5.* ? > >>> > >> > >>Sorry for typo .. it is 1.2.5.* > >> > >>>>and observing that my SMI requests are serving very fast > >>>>and response time is very less on the contrary my GSI requests were > >>>>served in > >>>>more time and response time sometime goes more than 2 sec. Any light on > >>>>this > >>>>different behavior. > >>> > >>> > >>>What are the specific GS requests which are slow in response ? Are they > >>>compute intensive ? > >> > >>I am sending only request for > >> > >> rpc.mgtclass = IB_PERFORMANCE_CLASS; > >> rpc.method = IB_MAD_METHOD_GET; > >> > >>at every one second.
Does perfquery work reliably with the same node(s) you are having trouble with ? Does your app follow what perfquery does ? > >>>In general, there are a few possibilities (which can cause this). SM > >>>traffic is VL15 whereas GS traffic is on a data VL (usually VL0 in most > >>>subnets). > >>> > >>>Some possibilities are: > >>>1. Timeout/retry being hit for some GS traffic (GS request or response > >>>lost/corrupted) > >> > >>Yes, this is also happening, Sometimes I am getting corrupt data back, > > > > > > Is there an error indicated ? > For such packets I am getting umad_status as 110. That's ETIMEDOUT. You need to handle the errors (and not treat the receive as a valid packet). Are you doing that ? The underlying question is why are you getting the timeout relatively frequently so I recommend checking all the error counters along the path. Are you sure the request gets to the responder ? Does the responder respond and it doesn't make it back ? -- Hal > >>and if I retry to send same request again it fails or send corrupted data > >>back again. > > > > > >>>2. Data VL busy (is there anything else utilizing VL0 ?) > >> > >>Not sure about it. Is there anything to verify it? > > > > > > There's an optional counter not commonly implemented so maybe starting > > by verifying all the PortCounters along the path from requester to > > responder to see whether there are any low level issues with your > > subnet. > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
