On Thu, 2007-12-20 at 17:43 +0200, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > On Thu, 2007-12-20 at 13:42 +0200, Yevgeny Kliteynik wrote: > >> Hal Rosenstock wrote: > >>> On Wed, 2007-12-19 at 11:58 -0800, [EMAIL PROTECTED] wrote: > >>>> We're seeing a regression in smpquery from alpha2 to rc1. > >>>> > >>>> For example, with alpha2 I get: > >>>> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c > >>>> # Node info: Lid 3 > >>>> BaseVers:........................1 > >>>> ClassVers:.......................1 > >>>> NodeType:........................Channel Adapter > >>>> NumPorts:........................2 > >>>> SystemGuid:......................0x00066a009800737c > >>>> Guid:............................0x00066a009800737c > >>>> PortGuid:........................0x00066a01a000737c > >>>> PartCap:.........................64 > >>>> DevId:...........................0x6278 > >>>> Revision:........................0x000000a0 > >>>> LocalPort:.......................2 > >>>> VendorId:........................0x00066a > >>>> grommit:~ # > >>>> > >>>> > >>>> And with rc1, I get: > >>>> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c > >>>> ibwarn: [5650] ib_path_query: sa call path_query failed > >>>> smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c > >>>> grommit:~ # > >>>> > >>>> But using a LID works fine: > >>>> grommit:~ # smpquery nodeinfo 3 > >>>> # Node info: Lid 3 > >>>> BaseVers:........................1 > >>>> ClassVers:.......................1 > >>>> NodeType:........................Channel Adapter > >>>> NumPorts:........................2 > >>>> SystemGuid:......................0x00066a009800737c > >>>> Guid:............................0x00066a009800737c > >>>> PortGuid:........................0x00066a01a000737c > >>>> PartCap:.........................64 > >>>> DevId:...........................0x6278 > >>>> Revision:........................0x000000a0 > >>>> LocalPort:.......................2 > >>>> VendorId:........................0x00066a > >>>> grommit:~ # > >>>> > >>>> Strangest of all, running it under strace also works: > >>>> grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c > > >>>> /tmp/smpquery.out > >>>> ..... > >>>> grommit:~ # cat /tmp/smpquery.out > >>>> # Node info: Lid 3 > >>>> BaseVers:........................1 > >>>> ClassVers:.......................1 > >>>> NodeType:........................Channel Adapter > >>>> NumPorts:........................2 > >>>> SystemGuid:......................0x00066a009800737c > >>>> Guid:............................0x00066a009800737c > >>>> PortGuid:........................0x00066a01a000737c > >>>> PartCap:.........................64 > >>>> DevId:...........................0x6278 > >>>> Revision:........................0x000000a0 > >>>> LocalPort:.......................2 > >>>> VendorId:........................0x00066a > >>>> grommit:~ # > >>>> > >>>> Some weird race condition... > >>>> > >>>> Anyone else seeing the same? > >>> -G requires a SA path record lookup so this could be an issue with that > >>> timing out in some cases (assuming the port is active and the SM is > >>> operational). > >> I'm seeing the same problem. > >> Sometimes the query works, and sometimes it doesn't. > >> I also see that when the query fails, OpenSM doesn't get PathRecord query > >> at all. > >> > >> Hal, can you elaborate on "that timing out in some cases" issue? > > > > I just meant that the SM not responding (for an unknown reason right > > now) would yield this effect. > > > >> Adding Jack for the libibmad issue: > >> > >> I see that the ib_path_query() in libibmad/sa.c sometimes fails > >> when calling safe_sa_call(). > > > > This could just be more detail on the same thing in terms of the > > (smpquery) client which is layered on top of libibmad: the SA path query > > timeout. > > I would suggest running OpenSM in verbose mode (both instances are with > > OpenSM) and seeing if it responds to the PathRecord query used by this > > form of smpquery and continue troubleshooting from there based on the > > result. > > This is actually what I was saying here. > I have *debugged* smpquery, and saw that the failing function is > ib_path_query() in libibmad/sa.c > As I've mentioned, I did run it with OpenSM in verbose mode, and saw > that when smpquery fails, OpenSM log does not have any PathRecord request. > When smpquery passes, I see the PathRecord request and response in the > OpenSM log.
OK; that wasn't clear before but is now (that the failure appears to be a client and not SM issue) :-) FWIW, I don't know what has changed that would affect this so it could be a latent bug as opposed to a regression. -- Hal > -- Yevgeny > > > -- Hal > > > >> -- Yevgeny > >> > >>> -- Hal > >>> _______________________________________________ > >>> general mailing list > >>> [email protected] > >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >>> > >>> To unsubscribe, please visit > >>> http://openib.org/mailman/listinfo/openib-general > >>> > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
