On Wed, 2008-03-05 at 18:22 +0000, Sasha Khapyorsky wrote: > On 09:10 Wed 05 Mar , Al Chu wrote: > > > > I can't restart opensm on that cluster at this time. I don't recall any > > port errors. However, I do recall seeing this output from > > __osm_state_mgr_light_sweep_start(): > > > > OSM_LOG(sm->p_log, OSM_LOG_ERROR, > > "ERR 0108: " > > "Unknown remote side for node 0x%016" > > PRIx64 > > "(%s) port %u. Adding to light sweep sampling list\n", > > cl_ntoh64(osm_node_get_node_guid > > (p_node)), > > p_node->print_desc, port_num); > > > > leading to a call to __osm_state_mgr_get_remote_port_info(), leading to > > what I fixed in osm_pi_rcv_process(). > > Yes, this is valid (handled) scenario. > > What I cannot understand is why it doesn't reach > __osm_pi_rcv_process_switch_port() (where ignore_existing_lfts flag > should be enforced in accordance with port state) after querying port > with "unknown" remotes during a light sweep. > > I did some experiments with ibsim and still not be able to reproduce > this. I'm afraid there could be some hidden bug which I'm not able to > catch yet. > > > My original assumption was that the remote side for some ports wasn't > > known b/c the remote side ports were down. Is it possible for opensm to > > not know about a remote side even if that remote side port is up/active? > > I think yes, some ports could be DOWN during initial discovery and become > INIT later during LID assignment and/or link state setup. Normally (as in > your scenario) next light sweep catches this and enforce heavy sweep.
Perhaps it does "reach __osm_pi_rcv_process_switch_port", but the need_update flag is just not set? Is it possible for those remote side ports to be at ARMED or ACTIVE before the 2nd heavy sweep? If so, then that remote side port would have their need_update flag cleared, and thus ignore_existing_lfts wouldn't be set in __osm_pi_rcv_process_switch_port(). Al > Sasha -- Albert Chu [EMAIL PROTECTED] 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
