On Tue, 29 Sep 2009 18:16:21 +0200
"Eli Dorfman (Voltaire)" <dorfman....@gmail.com> wrote:

> Ira Weiny wrote:
> > Eli,
> > 
> > On Wed, 26 Aug 2009 17:37:30 +0300
> > "Eli Dorfman (Voltaire)" <dorfman....@gmail.com> wrote:
> > 
> >> Subject: [PATCH] Fix IB network discovery from switch node.
> > 
> > Sorry for the late inquiry on this but what exactly was the bug here?
> 
> Sorry for the late response.
> The problem is related to wrong discovery when running from the switch.
> Without the patch ibnetdiscover finds only local switch

Ok I see.

[snip]

> 
> I think that the problem is related to NodeInfo:LocalPort which is 0 in case 
> of a switch.
> I see that get_remote_node() sends direct route MAD to switch with path 0,0 
> and that fails (at least for Mellanox IS4 switch chips).
> Another way to bypass this may be as follows:
> 
> diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c 
> b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
> index 1e93ff8..3dd0dc6 100644
> --- a/infiniband-diags/libibnetdisc/src/ibnetdisc.c
> +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
> @@ -461,7 +461,7 @@ get_remote_node(struct ibnd_fabric *fabric, struct 
> ibnd_node *node, struct ibnd_
>                       != IB_PORT_PHYS_STATE_LINKUP)
>               return -1;
>  
> -     if (extend_dpath(fabric, path, portnum) < 0)
> +     if (portnum > 0 && extend_dpath(fabric, path, portnum) < 0)
>               return -1;
>  
>       if (query_node(fabric, &node_buf, &port_buf, path)) {
> 
> 
> Please check whether this is OK and I can send a new patch.
> 

This seems to fix my issue.  Here is a patch against master which works for
me.  If you want to verify that would be great.

Thanks for helping me out,
Ira

From: Ira Weiny <wei...@llnl.gov>
Date: Tue, 22 Sep 2009 11:08:28 -0700
Subject: [PATCH] infiniband-diags/libibnetdisc/src/ibnetdisc.c: fix bug in 
single node processing.

        Eli fixed an issue with running ibnetdiscover from a switch but it
        introduced a bug in processing a single switch:

17:19:42 > ./iblinkinfo -S 0x000b8cffff00490c
Switch 0x000b8cffff00490c MT47396 Infiniscale-III Mellanox Technologies:
...
           8   11[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>             [  ] "" ( 
)
           8   12[  ] ==( 4X 5.0 Gbps Active/  LinkUp)==>             [  ] "" ( 
)
           8   13[  ] ==( 4X 2.5 Gbps   Down/ Polling)==>             [  ] "" ( 
)
...

        The port we "come in on" when discovering the switch is not reported 
properly.

   This patch, suggested by Eli, reverses Eli's patch and fixes his original
   bug in a way which does not introduce the above issue.

Signed-off-by: Ira Weiny <wei...@llnl.gov>
---
 infiniband-diags/libibnetdisc/src/ibnetdisc.c |   18 ++++++++----------
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c 
b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
index 97e369c..96f72c5 100644
--- a/infiniband-diags/libibnetdisc/src/ibnetdisc.c
+++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c
@@ -506,7 +506,7 @@ static int get_remote_node(struct ibmad_port *ibmad_port,
            != IB_PORT_PHYS_STATE_LINKUP)
                return 1;       /* positive == non-fatal error */
 
-       if (extend_dpath(ibmad_port, fabric, path, portnum) < 0)
+       if (portnum > 0 && extend_dpath(ibmad_port, fabric, path, portnum) < 0)
                return -1;
 
        if (query_node(ibmad_port, fabric, &node_buf, &port_buf, path)) {
@@ -600,15 +600,13 @@ ibnd_fabric_t *ibnd_discover_fabric(struct ibmad_port * 
ibmad_port,
        if (!port)
                goto error;
 
-       if (node->type != IB_NODE_SWITCH) {
-               rc = get_remote_node(ibmad_port, fabric, node, port, from,
-                                    mad_get_field(node->info, 0,
-                                                  IB_NODE_LOCAL_PORT_F), 0);
-               if (rc < 0)
-                       goto error;
-               if (rc > 0)             /* non-fatal error, nothing more to be 
done */
-                       return ((ibnd_fabric_t *) fabric);
-       }
+       rc = get_remote_node(ibmad_port, fabric, node, port, from,
+                            mad_get_field(node->info, 0,
+                                          IB_NODE_LOCAL_PORT_F), 0);
+       if (rc < 0)
+               goto error;
+       if (rc > 0)             /* non-fatal error, nothing more to be done */
+               return ((ibnd_fabric_t *) fabric);
 
        for (dist = 0; dist <= max_hops; dist++) {
 
-- 
1.5.4.5



_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to