Yevgeny Kliteynik wrote:
> Yevgeny Kliteynik wrote:
>> Line Holen wrote:
>>> On 11/ 4/09 04:54 PM, Yevgeny Kliteynik wrote:
>>>> Line Holen wrote:
>>>>> On 11/ 4/09 10:47 AM, Yevgeny Kliteynik wrote:
>>>>>> Sasha Khapyorsky wrote:
>>>>>>> On 12:26 Tue 03 Nov     , Yevgeny Kliteynik wrote:
>>>>>>>> Always do heavy sweep when there is only one node in the
>>>>>>>> fabric, and this node is a switch, and SM runs on top of it -
>>>>>>>> there may be a race when OSM starts running before the
>>>>>>>> external ports are ports are up, or if they went through
>>>>>>>> reset while SM was starting.
>>>>>>>> In this race switch brings up the ports and turns on the
>>>>>>>> PSC bit, but OSM might get PortInfo before SwitchInfo, and it
>>>>>>>> might see all ports as down, but PSC bit on. If that happens,
>>>>>>>> OSM turns off PSC bit, and it will never see external ports
>>>>>>>> again - it won't perform any heavy sweep, only light sweep
>>>>>>> Could such race happen when there are more than one node in a
>>>>>>> fabric?
>>>>>> I think that my description of the race was misleading.
>>>>>> The race can happen on *any* fabric when SM runs on switch.
>>>>>> But when it does happen, SM thinks that the whole subnet
>>>>>> is just one switch - that's what it managed to discover.
>>>>>> I've actually seen it happening.
>>>>>> So the patch fixes this particular case.
>>>>>>
>>>>>> So the next question that you would probably ask is can
>>>>>> this race happen on some *other* switch and not the one
>>>>>> SM is running on?
>>>>>>
>>>>>> Well, I don't know. I have a hunch that it can't, but I
>>>>>> couldn't prove it to myself yet.
>>>>>>
>>>>>> The race on the managed switch is a special case because
>>>>>> SM always sees port 0, and always gets responses to its
>>>>>> SMP queries. On any other switch, if the ports were reset,
>>>>>> SM won't get any response until the ports are up again.
>>>>>>
>>>>>> Perhaps there might be a case where SM got some port as down,
>>>>>> and by the time SM got SwitchInfo with PSC bit the port
>>>>>> was already up, so SM won't start discovery beyond this
>>>>>> port. But this race would be fixed on the next heavy sweep,
>>>>>> when SM will discover this port that it missed the previous
>>>>>> time, whereas race on managed switch is fatal - SM won't
>>>>>> ever do any heavy sweep.
>>>>>>
>>>>>> -- Yevgeny
>>>>> At least for the 3.2 branch there is a general race regardless of
>>>>> where the SM is running. I haven't checked the current master, but
>>>>> I cannot recall seeing any patches related to this so I assume
>>>>> the race is still there.
>>>>>
>>>>> There is a window between SM discovering a switch and clearing PSC
>>>>> for the same switch. The SM will not detect a state change on the
>>>>> switch ports during this time.
>>>> If the port changes state during that period, the switch issues
>>>> new trap 128, which (I think) should cause SM to re-discover the
>>>> fabric once this discovery cycle is over. Is this correct?
>>>>
>>>
>>> I think the switch shall send a trap whenever it sets the PSC bit.
>>> Once set I believe it will not send another trap until it is reset.
>>> Or do I misinterpret the spec ?
>>
>> I may be wrong, but I thought that this is how things work:
>> - port state changes
>> - switch turns on PSC bit and starts sending traps
>> - SM gets the trap, sends trap repress
>> - switch gets trap repress and stops sending traps
>> - PSC is still on
>> - port state changes again (the same or any other port)
>> - switch turns on PSC bit (which doesn't matter as PSC is
>>   already on) and starts sending traps again
>> - etc...
>>
>> Anyway, I'll double-check this issue.
> 
> Yep, verified.
> Switch sends traps regardless the PSC bit status.
> Also, the spec doesn't link them together:
> 
>   o14-5.1.1: If a switch supports Traps (PortInfo:
>   CapabilityMask.IsTrap-Supported is one), its SMA
>   shall send trap 128 to the SM indicated by the   PortInfo:MasterSMLID
> under any condition that   would cause SwitchInfo:PortStateChange to be set
>   to one. (See 14.2.5.4 SwitchInfo on page 827.)
> 

Trap will be sent according to the SMLID. After first bring up the SMLID is not 
set yet and trap will not be sent.
In that case the opensm would discover the change only by PSC bit.
For IS3 chips the PSC bit and/or trap were set only after one or more ports 
changed their state, so I don't understand how can the SM discover PSC bit set 
while all ports are down. Or is this a change in IS4?

Eli

> -- Yevgeny
> 
>> -- Yevgeny
>>
>>>> Or perhaps the more serious problem happens when SM LID is not
>>>> configured yet on the switch, hence the trap is not going to the
>>>> right place?
>>>>
>>>>> I have a patch for the 3.2 branch that I can merge into master.
>>>> Sure, that would be nice :)
>>>>
>>>> -- Yevgeny
>>>>
>>>>
>>>>> Line
>>>>>
>>>>>>> Sasha
>>>>>>>
>>>>>>>> Signed-off-by: Yevgeny Kliteynik <klit...@dev.mellanox.co.il>
>>>>>>>> ---
>>>>>>>>  opensm/opensm/osm_state_mgr.c |   15 ++++++++++-----
>>>>>>>>  1 files changed, 10 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/opensm/opensm/osm_state_mgr.c
>>>>>>>> b/opensm/opensm/osm_state_mgr.c
>>>>>>>> index 4303d6e..537c855 100644
>>>>>>>> --- a/opensm/opensm/osm_state_mgr.c
>>>>>>>> +++ b/opensm/opensm/osm_state_mgr.c
>>>>>>>> @@ -1062,13 +1062,18 @@ static void do_sweep(osm_sm_t * sm)
>>>>>>>>       * Otherwise, this is probably our first discovery pass
>>>>>>>>       * or we are connected in loopback. In both cases do a
>>>>>>>>       * heavy sweep.
>>>>>>>> -     * Note: If we are connected in loopback we want a heavy
>>>>>>>> -     * sweep, since we will not be getting any traps if there is
>>>>>>>> -     * a lost connection.
>>>>>>>> +     * Note the following:
>>>>>>>> +     * 1. If we are connected in loopback we want a heavy sweep,
>>>>>>>> since we
>>>>>>>> +     *    will not be getting any traps if there is a lost
>>>>>>>> connection.
>>>>>>>> +     * 2. If we are in DISCOVERING state - this means it is
>>>>>>>> either in
>>>>>>>> +     *    initializing or wake up from STANDBY - run the heavy
>>>>>>>> sweep.
>>>>>>>> +     * 3. If there is only one node in the fabric, and this
>>>>>>>> node is a
>>>>>>>> +     *    switch, and OSM runs on top of it, there might be a race
>>>>>>>> when
>>>>>>>> +     *    OSM starts running before the external ports are up -
>>>>>>>> run the
>>>>>>>> +     *    heavy sweep.
>>>>>>>>       */
>>>>>>>> -    /*  if we are in DISCOVERING state - this means it is
>>>>>>>> either in
>>>>>>>> -     *  initializing or wake up from STANDBY - run the heavy
>>>>>>>> sweep */
>>>>>>>>      if (cl_qmap_count(&sm->p_subn->sw_guid_tbl)
>>>>>>>> +        && cl_qmap_count(&sm->p_subn->node_guid_tbl) != 1
>>>>>>>>          && sm->p_subn->sm_state != IB_SMINFO_STATE_DISCOVERING
>>>>>>>>          && sm->p_subn->opt.force_heavy_sweep == FALSE
>>>>>>>>          && sm->p_subn->force_heavy_sweep == FALSE
>>>>>>>> -- 
>>>>>>>> 1.5.1.4
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>>>> linux-rdma" in
>>>>>>>> the body of a message to majord...@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in
>>>>>> the body of a message to majord...@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> -- 
>>>> To unsubscribe from this list: send the line "unsubscribe
>>>> linux-rdma" in
>>>> the body of a message to majord...@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to