Hi Lan,

On 19:01 Fri 09 Nov     , Lan Tran wrote:
> 
> I'm seeing a problem with missing out-of-svc trap notifications when a Master 
> SM port is disabled. I'm taking a look into it now, but if you have any 
> pointers or ideas of what might be going on or how to resolve it, that would 
> be much appreciated! 
> 
> I am subscribing to be informed of out-of-service trap events (i.e. trap 65), 
> registering my own callback. When I disable an IB port of a remote node that 
> is running the Standby SM, then, as expected, my trap callback function is 
> called. But when I disable the IB port of the remote node that is the Master 
> SM, my trap 65 callback is never called.  From looking at the opensm logs it 
> seems what is happening is: 
> 1) I disable port running Master SM 
> 2) SM handover starts  
>    --> during Standby SM's heavy sweep, osm_drop_mgr_process() detects that 
> the old Master SM port is down ... but at this point no subscribers to be 
> informed because they are all subscribed with the old Master SM  
>    ---> Standby SM enters Master SM state, so now new Master SM  
> 3) Several seconds later, I subscribe with the new Master SM for trap 65 
> notification (I do this whenever I receive IB_EVENT_CLIENT_REREGISTER event), 
> but this is too late as the report notice for the dropped old Master SM port 
> already occurred earlier. 

Right, it is how things work now. Stand-by OpenSM doesn't track subnet
changes, so it will not send any notices on first sweep when becoming
master (OpenSM which is doing master->stand-by transition sends, but in
your case its port is disconnected).

> It seems I need to somehow make sure that I have subscribed for a trap 65 
> notification with the to-be new Master SM when it decides to report that the 
> old Master SM port goes down. Not quite sure if this is possible though :) 

This will not help. OpenSM doesn't send in/out service traps at first
sweep. I don't see an easy solution here - we will need replicate SM and
SA databases somehow.

OTOH even then a trap can be lost due to transmission errors, etc..

Sasha
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to