Aaron,

Aaron Knister wrote:
As I said, the older opensms
on the older mellanox model HCAs failsover and failsback instantly.
The instant failback is expected, and this is the bug that
we're discussing. As for the instant failover - I'll check
how the things supposed to work and get back to you.

After checking this thing, I don't understand how the
instant failover is possible. The only case it can
work is if you don't have a switch in your subnet - just two HCAs connected directly to each other.
Is this the case?

If not, then I'd like to see opensm logs.
Please run opensm as before (-V -s 0 -e).
Start OSM on node A with high priority.
Start OSM on node B with low priority.
Kill OSM on node A, and see that OSM on
node B becomes master.

I need only the log of the opensm on node B.
Best if you could just attach it to the bugzilla
issue form, but if you can't - you can mail it to me.

-- Yevgeny

-- Yevgeny

On Tue, Oct 13, 2009 at 11:32 AM, Yevgeny Kliteynik
<[email protected]> wrote:
Aaron,

Thanks for the logs, this was really helpful.
Looks like there is a handover race in the OSM -
SM on node A misses the fact that SM on node B
have gave up its mastership.

There is a bugzilla issue the describes all the
details of this race:

https://bugs.openfabrics.org/show_bug.cgi?id=1499

I've updated the issue form with your case, and we will continue
following
this bug there.

-- Yevgeny

Aaron Knister wrote:
While the adapters have mellanox chipsets their actually IBM OEM
branded and IBM hasn't released the 2.7 fw yet. I'm a little hesitant
to apply the generic Mellanox FW.

On Mon, Oct 12, 2009 at 4:22 AM, Yevgeny Kliteynik
<[email protected]> wrote:
Or Gerlitz wrote:
Yevgeny Kliteynik wrote:
There was a hand-over problem in OFED 1.4, but later it turned  out
to
be
FW issue. The thing is, FW version 2.6.648 doesn't  have this bug any
more...
so things should work fine with the newly released 2.7 firmware?
Yes

if this is still under question, Aaron, I suggest you open a bugzilla
case
@ https://bugs.openfabrics.org and we can track from there.
Good idea.

-- Yevgeny

Or.





--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to