It will be difficult to get you those logs until the older cluster is
decommissioned (which in theory should be soon), but as soon as I am
able I will get them too you.

On Wed, Oct 14, 2009 at 4:46 AM, Yevgeny Kliteynik
<[email protected]> wrote:
> Aaron,
>
> Aaron Knister wrote:
>>>>
>>>> As I said, the older opensms
>>>> on the older mellanox model HCAs failsover and failsback instantly.
>>>
>>> The instant failback is expected, and this is the bug that
>>> we're discussing. As for the instant failover - I'll check
>>> how the things supposed to work and get back to you.
>
> After checking this thing, I don't understand how the
> instant failover is possible. The only case it can
> work is if you don't have a switch in your subnet - just two HCAs connected
> directly to each other.
> Is this the case?
>
> If not, then I'd like to see opensm logs.
> Please run opensm as before (-V -s 0 -e).
> Start OSM on node A with high priority.
> Start OSM on node B with low priority.
> Kill OSM on node A, and see that OSM on
> node B becomes master.
>
> I need only the log of the opensm on node B.
> Best if you could just attach it to the bugzilla
> issue form, but if you can't - you can mail it to me.
>
> -- Yevgeny
>
>>> -- Yevgeny
>>>
>>>> On Tue, Oct 13, 2009 at 11:32 AM, Yevgeny Kliteynik
>>>> <[email protected]> wrote:
>>>>>
>>>>> Aaron,
>>>>>
>>>>> Thanks for the logs, this was really helpful.
>>>>> Looks like there is a handover race in the OSM -
>>>>> SM on node A misses the fact that SM on node B
>>>>> have gave up its mastership.
>>>>>
>>>>> There is a bugzilla issue the describes all the
>>>>> details of this race:
>>>>>
>>>>> https://bugs.openfabrics.org/show_bug.cgi?id=1499
>>>>>
>>>>> I've updated the issue form with your case, and we will continue
>>>>> following
>>>>> this bug there.
>>>>>
>>>>> -- Yevgeny
>>>>>
>>>>> Aaron Knister wrote:
>>>>>>
>>>>>> While the adapters have mellanox chipsets their actually IBM OEM
>>>>>> branded and IBM hasn't released the 2.7 fw yet. I'm a little hesitant
>>>>>> to apply the generic Mellanox FW.
>>>>>>
>>>>>> On Mon, Oct 12, 2009 at 4:22 AM, Yevgeny Kliteynik
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Or Gerlitz wrote:
>>>>>>>>
>>>>>>>> Yevgeny Kliteynik wrote:
>>>>>>>>>
>>>>>>>>> There was a hand-over problem in OFED 1.4, but later it turned  out
>>>>>>>>> to
>>>>>>>>> be
>>>>>>>>> FW issue. The thing is, FW version 2.6.648 doesn't  have this bug
>>>>>>>>> any
>>>>>>>>> more...
>>>>>>>>
>>>>>>>> so things should work fine with the newly released 2.7 firmware?
>>>>>>>
>>>>>>> Yes
>>>>>>>
>>>>>>>> if this is still under question, Aaron, I suggest you open a
>>>>>>>> bugzilla
>>>>>>>> case
>>>>>>>> @ https://bugs.openfabrics.org and we can track from there.
>>>>>>>
>>>>>>> Good idea.
>>>>>>>
>>>>>>> -- Yevgeny
>>>>>>>
>>>>>>>> Or.
>>>>>>>>
>>>>>>>>
>>>
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to