Hi Sasha, On Wed, Sep 1, 2010 at 9:43 AM, Sasha Khapyorsky <[email protected]> wrote: > Hi Hal, > > On 13:27 Wed 25 Aug , Hal Rosenstock wrote: >> >> I'm seeing an issue with ibnetdiscover from a CA port where it appears >> to extend a path at a "remote" CA port (it's actually another port on >> the same CA) to query NodeInfo of the next hop beyond it. I get the >> following error message: >> >> src/query_smp.c:188; umad (DR path slid 0; dlid 0; 0,1,20,2 Attr >> 0x11:0) bad status 110; Connection timed out >> >> where smpquery -D nodeinfo of 0,1,20 is a CA which can also be seen >> from the topology. >> >> It appears to stem from the following code snippet from >> libibnetdisc/src/ibnetdisc.c:recv_port_info >> >> if (port_num && mad_get_field(port->info, 0, IB_PORT_PHYS_STATE_F) >> == IB_PORT_PHYS_STATE_LINKUP >> && ((node->type == IB_NODE_SWITCH && port_num != local_port) || >> (node == fabric->from_node && port_num == local_port))) { >> ib_portid_t path = smp->path; >> if (extend_dpath(engine, &path, port_num) > 0) >> query_node_info(engine, &path, node); >> } > > This makes sense for me. > >> >> that was introduced by: >> commit fcb8d5e7588e38508a8e354c37009d73c0a3889f >> Author: Sasha Khapyorsky <[email protected]> >> Date: Sat Apr 10 02:43:24 2010 +0300 >> >> libibnetdisc: no backward NodeInfo queries >> >> Then switch is reached via port N we don't need to query back via this >> port - source node is discovered already. Finally this saves some amount >> of unnecessary MADs. >> >> Signed-off-by: Sasha Khapyorsky <[email protected]> >> >> and subsequently modified by: >> commit 49d149c63a44d99259f516a15af53d8cf3f0e7c9 >> Author: Sasha Khapyorsky <[email protected]> >> Date: Tue Apr 13 19:54:45 2010 +0300 >> >> libibnetdisc: don't try to cross discovery over CA >> >> When discovery is running from CA node it shouldn't try to cross over >> all ports, but only via local one (send over non-local ports will fail >> since CA doesn't route MADs). >> >> Signed-off-by: Sasha Khapyorsky <[email protected]> >> >> due to the (node == fabric->from_node && port_num == local_port) >> clause being TRUE. > > But I don't see how those patches are actually related to the story. An > original (before patches) condition was: > > if (port_num && mad_get_field(port->info, 0, IB_PORT_PHYS_STATE_F) > == IB_PORT_PHYS_STATE_LINKUP > && (node->type == IB_NODE_SWITCH || node == fabric->from_node)) > > , which has the described bug as I can understand this.
I thought this used to work and those changes looked related to me. Maybe the fix is right but that part of the problem description isn't. Do you want a revised patch without that part of the description ? -- Hal > > Sasha > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
