Hal Rosenstock wrote:
On Thu, 2007-12-20 at 13:42 +0200, Yevgeny Kliteynik wrote:
Hal Rosenstock wrote:
On Wed, 2007-12-19 at 11:58 -0800, [EMAIL PROTECTED] wrote:
We're seeing a regression in smpquery from alpha2 to rc1.
For example, with alpha2 I get:
grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
# Node info: Lid 3
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Channel Adapter
NumPorts:........................2
SystemGuid:......................0x00066a009800737c
Guid:............................0x00066a009800737c
PortGuid:........................0x00066a01a000737c
PartCap:.........................64
DevId:...........................0x6278
Revision:........................0x000000a0
LocalPort:.......................2
VendorId:........................0x00066a
grommit:~ #

And with rc1, I get:
grommit:~ # smpquery -G nodeinfo 0x66a01a000737c
ibwarn: [5650] ib_path_query: sa call path_query failed
smpquery: iberror: failed: can't resolve destination port 0x66a01a000737c
grommit:~ #
But using a LID works fine:
grommit:~ # smpquery nodeinfo 3
# Node info: Lid 3
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Channel Adapter
NumPorts:........................2
SystemGuid:......................0x00066a009800737c
Guid:............................0x00066a009800737c
PortGuid:........................0x00066a01a000737c
PartCap:.........................64
DevId:...........................0x6278
Revision:........................0x000000a0
LocalPort:.......................2
VendorId:........................0x00066a
grommit:~ #
Strangest of all, running it under strace also works:
grommit:~ # strace smpquery -G nodeinfo 0x66a01a000737c > /tmp/smpquery.out .....
grommit:~ # cat /tmp/smpquery.out
# Node info: Lid 3
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Channel Adapter
NumPorts:........................2
SystemGuid:......................0x00066a009800737c
Guid:............................0x00066a009800737c
PortGuid:........................0x00066a01a000737c
PartCap:.........................64
DevId:...........................0x6278
Revision:........................0x000000a0
LocalPort:.......................2
VendorId:........................0x00066a
grommit:~ #

Some weird race condition...

Anyone else seeing the same?
-G requires a SA path record lookup so this could be an issue with that
timing out in some cases (assuming the port is active and the SM is
operational).
I'm seeing the same problem.
Sometimes the query works, and sometimes it doesn't.
I also see that when the query fails, OpenSM doesn't get PathRecord query at 
all.

Hal, can you elaborate on "that timing out in some cases" issue?

I just meant that the SM not responding (for an unknown reason right
now) would yield this effect.

Adding Jack for the libibmad issue:

I see that the ib_path_query() in libibmad/sa.c sometimes fails
when calling safe_sa_call().

This could just be more detail on the same thing in terms of the
(smpquery) client which is layered on top of libibmad: the SA path query
timeout.
I would suggest running OpenSM in verbose mode (both instances are with
OpenSM) and seeing if it responds to the PathRecord query used by this
form of smpquery and continue troubleshooting from there based on the
result.

This is actually what I was saying here.
I have *debugged* smpquery, and saw that the failing function is
ib_path_query() in libibmad/sa.c
As I've mentioned, I did run it with OpenSM in verbose mode, and saw
that when smpquery fails, OpenSM log does not have any PathRecord request.
When smpquery passes, I see the PathRecord request and response in the
OpenSM log.

-- Yevgeny

-- Hal

-- Yevgeny

-- Hal
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to