On Wed, 2008-03-05 at 10:43 +0000, Sasha Khapyorsky wrote:
> Hi Al,
> 
> On 07:46 Sun 02 Mar     , Albert Chu wrote:
> > 
> > In order to make things work, I also had to add this patch.  Seems like a
> > corner case that needs to be handled since we never fall into
> > __osm_pi_rcv_process_switch_port().
> 
> Hmm, it is strange. After this light sweep cycle OpenSM should continue
> with heavy sweep where __osm_pi_rcv_process_switch_port() should be
> reissued. Do you see any errors during discovery?

I can't restart opensm on that cluster at this time.  I don't recall any
port errors.  However, I do recall seeing this output from
__osm_state_mgr_light_sweep_start():

OSM_LOG(sm->p_log, OSM_LOG_ERROR,
        "ERR 0108: "
        "Unknown remote side for node 0x%016"
        PRIx64
        "(%s) port %u. Adding to light sweep sampling list\n",
        cl_ntoh64(osm_node_get_node_guid
                  (p_node)),
        p_node->print_desc, port_num);

leading to a call to __osm_state_mgr_get_remote_port_info(), leading to
what I fixed in osm_pi_rcv_process().

My original assumption was that the remote side for some ports wasn't
known b/c the remote side ports were down.  Is it possible for opensm to
not know about a remote side even if that remote side port is up/active?

Al

> Sasha
-- 
Albert Chu
[EMAIL PROTECTED]
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to