On Thu, 2007-12-20 at 13:42 +0200, Yevgeny Kliteynik wrote: > Hal Rosenstock wrote: > > On Wed, 2007-12-19 at 11:58 -0800, [EMAIL PROTECTED] wrote: > >> We're seeing a regression in smpquery from alpha2 to rc1. > >> > >> For example, with alpha2 I get: > >> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c > >> # Node info: Lid 3 > >> BaseVers:........................1 > >> ClassVers:.......................1 > >> NodeType:........................Channel Adapter > >> NumPorts:........................2 > >> SystemGuid:......................0x00066a009800737c > >> Guid:............................0x00066a009800737c > >> PortGuid:........................0x00066a01a000737c > >> PartCap:.........................64 > >> DevId:...........................0x6278 > >> Revision:........................0x000000a0 > >> LocalPort:.......................2 > >> VendorId:........................0x00066a > >> grommit:~ # > >> > >> > >> And with rc1, I get: > >> grommit:~ # smpquery -G nodeinfo 0x66a01a000737c > >> ibwarn: [5650] ib_path_query: sa call path_query failed > >> smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c > >> grommit:~ # > >> > >> But using a LID works fine: > >> grommit:~ # smpquery nodeinfo 3 > >> # Node info: Lid 3 > >> BaseVers:........................1 > >> ClassVers:.......................1 > >> NodeType:........................Channel Adapter > >> NumPorts:........................2 > >> SystemGuid:......................0x00066a009800737c > >> Guid:............................0x00066a009800737c > >> PortGuid:........................0x00066a01a000737c > >> PartCap:.........................64 > >> DevId:...........................0x6278 > >> Revision:........................0x000000a0 > >> LocalPort:.......................2 > >> VendorId:........................0x00066a > >> grommit:~ # > >> > >> Strangest of all, running it under strace also works: > >> grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c > > >> /tmp/smpquery.out > >> ..... > >> grommit:~ # cat /tmp/smpquery.out > >> # Node info: Lid 3 > >> BaseVers:........................1 > >> ClassVers:.......................1 > >> NodeType:........................Channel Adapter > >> NumPorts:........................2 > >> SystemGuid:......................0x00066a009800737c > >> Guid:............................0x00066a009800737c > >> PortGuid:........................0x00066a01a000737c > >> PartCap:.........................64 > >> DevId:...........................0x6278 > >> Revision:........................0x000000a0 > >> LocalPort:.......................2 > >> VendorId:........................0x00066a > >> grommit:~ # > >> > >> Some weird race condition... > >> > >> Anyone else seeing the same? > > > > -G requires a SA path record lookup so this could be an issue with that > > timing out in some cases (assuming the port is active and the SM is > > operational). > > I'm seeing the same problem. > Sometimes the query works, and sometimes it doesn't. > I also see that when the query fails, OpenSM doesn't get PathRecord query at > all. > > Hal, can you elaborate on "that timing out in some cases" issue?
I just meant that the SM not responding (for an unknown reason right now) would yield this effect. > Adding Jack for the libibmad issue: > > I see that the ib_path_query() in libibmad/sa.c sometimes fails > when calling safe_sa_call(). This could just be more detail on the same thing in terms of the (smpquery) client which is layered on top of libibmad: the SA path query timeout. I would suggest running OpenSM in verbose mode (both instances are with OpenSM) and seeing if it responds to the PathRecord query used by this form of smpquery and continue troubleshooting from there based on the result. -- Hal > -- Yevgeny > > > -- Hal > > _______________________________________________ > > general mailing list > > [email protected] > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
