Thanks! I really appreciate that. I still have a question about the initial failover- I'm still wondering why there's a 30 second delay. Wouldn't nodeA send some type of handover message (my IB knowledge is limited) to notify a subnet manager of a lower priority to take over? As I said, the older opensms on the older mellanox model HCAs failsover and failsback instantly.
On Tue, Oct 13, 2009 at 11:32 AM, Yevgeny Kliteynik <[email protected]> wrote: > Aaron, > > Thanks for the logs, this was really helpful. > Looks like there is a handover race in the OSM - > SM on node A misses the fact that SM on node B > have gave up its mastership. > > There is a bugzilla issue the describes all the > details of this race: > > https://bugs.openfabrics.org/show_bug.cgi?id=1499 > > I've updated the issue form with your case, and we will continue following > this bug there. > > -- Yevgeny > > Aaron Knister wrote: >> >> While the adapters have mellanox chipsets their actually IBM OEM >> branded and IBM hasn't released the 2.7 fw yet. I'm a little hesitant >> to apply the generic Mellanox FW. >> >> On Mon, Oct 12, 2009 at 4:22 AM, Yevgeny Kliteynik >> <[email protected]> wrote: >>> >>> Or Gerlitz wrote: >>>> >>>> Yevgeny Kliteynik wrote: >>>>> >>>>> There was a hand-over problem in OFED 1.4, but later it turned out to >>>>> be >>>>> FW issue. The thing is, FW version 2.6.648 doesn't have this bug any >>>>> more... >>>> >>>> so things should work fine with the newly released 2.7 firmware? >>> >>> Yes >>> >>>> if this is still under question, Aaron, I suggest you open a bugzilla >>>> case >>>> @ https://bugs.openfabrics.org and we can track from there. >>> >>> Good idea. >>> >>> -- Yevgeny >>> >>>> Or. >>>> >>>> >>> >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
