Sasha Khapyorsky wrote:
On 16:50 Mon 19 Mar , Yevgeny Kliteynik wrote:
In __osm_ucast_mgr_process_neighbor(), there is the following assertion:
CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho,
port_num ) );
This assertion fails, since the hop count becomes inconsistent.
This is not big problem IMO, we just need to not deal with non-existing
LIDs there (so __osm_ucast_mgr_process_neighbor() code should be
improved in this direction and this assertion removed). And the LFTs
generation code doesn't try to build entries for non-existing LIDs, so
"old" min hop vectors will be ignored there.
But I think we could have a problem when the port (switch with master)
is reconnected at different location. Then old/invalid hop counts will
be counted again and if it "wins" we can get not expected routing paths.
So obviously hop matrix cleanup is simplest fix - Agreed.
I'm not sure about the trunk though.
Sasha,
Can you please check that you latest improvements to the
routing don't have this problem?
With disconnecting switches should be similar behavior I guess.
Right, I checked it - same problem.
Interesting. This function is different in the master and doesn't scan
LIDs from 1 up to max anymore, instead it scans only switches existing at
the moment.
Could you provide more details about the master? Do you able to see the
problem with just switch disconnections? What is the test case?
I had this problem on some copy of master that wasn't updated.
After updating it I can't see this problem happening again.
But the hop count in not cleared there too, so even if I can't
recreate this problem (or even if the new flow solves this particular
bug), I think we do agree that it would be better to clear hop count
anyway.
-- Yevgeny
Sasha
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general