On Tue, 2008-06-10 at 13:18 -0700, Ira Weiny wrote:
> On Tue, 10 Jun 2008 11:35:19 -0700
> Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2008-06-10 at 11:26 -0700, Ira Weiny wrote:
> > > On Tue, 10 Jun 2008 11:16:24 -0700
> > > Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> 
> <snip>
> 
> > > > 
> > > > Looks to me like it relies on some node GUID being same as port GUID.
> > > > While that's allowable for one port, it won't always be the case.
> > > 
> > > In my test system the Node GUID and port GUID are different and this 
> > > works.  I
> > > am specifically using the port guid out of the NodeInfo struct of the
> > > NodeRecord.  So I should be using the port guid vs port GID ID _only_.
> > 
> > Not sure what you mean; port GID 0 = subnet prefix + port GUID. Not sure
> > what SMs support other port GIDs than this but OpenSM doesn't.
> 
> I mean the port gid == subnet prefix + port GUID.  Therefore if you mask off
> the subnet prefix (ie only use the lower 64bits of the gid) you should end up
> with the GUID, right?

Yes.

> >  
> > > > > Have you found some configuration in which this does not
> > > > > work?
> > > > 
> > > > Yes.
> > > 
> > > :-(  Sorry, I'm pretty sure this should be ok...
> > 
> > It's possible it's an environment thing but there's a couple of
> > suboptimialities in print_multicast_member_record:
> 
> That is probably true.  I see you have already sent a patch to not do the node
> record queries unless necessary.
> 
> > 
> > 1. Does osmv_get_query_node_rec always return a non NULL pointer ?
> 
> :-/  Yea that is a bug.  Yes using node_record would be bad if for some reason
> the node_record did not exists.  However, the port should not be a member of
> the mcast group if it does not exist in the SA.  So... you should not have a
> mcast member record.
> 
> I think there is a race condition here that I did not check, that is bad.

> > 2. If the loop fails to find a match, the last node description is used.
> 
> Yep, that is a bug as well.  I guess this masks the above situation.  Would 
> you
> agree this is a corner case?  Unless your fabric is changing quickly there
> should be a node record for each member record right?  
> 
> Patch below.

Looks better to me.

> Should there be a warning printed in the case the node_record is not found?

NodeDescription <unknown> seems sufficient to me.

-- Hal

> Ira
> 
> 
> From ac10f52fd65b0b34b409ca2aa266a7363ae32e2c Mon Sep 17 00:00:00 2001
> From: Ira K. Weiny <[EMAIL PROTECTED]>
> Date: Tue, 10 Jun 2008 13:07:43 -0700
> Subject: [PATCH] infiniband-diags/src/saquery.c: fix potential core dump 
> and/or incorrect node
> descriptions from being printed if a corresponding node_record is not found 
> for
> a multicast member record.
> 
> Signed-off-by: Ira K. Weiny <[EMAIL PROTECTED]>
> ---
>  infiniband-diags/src/saquery.c |   17 +++++++++++------
>  1 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c
> index e727940..89754f6 100644
> --- a/infiniband-diags/src/saquery.c
> +++ b/infiniband-diags/src/saquery.c
> @@ -354,14 +354,19 @@ print_multicast_member_record(ib_member_rec_t *p_mcmr)
>       uint64_t gid_prefix = cl_ntoh64( p_mcmr->port_gid.unicast.prefix );
>       uint64_t gid_interface_id = cl_ntoh64( 
> p_mcmr->port_gid.unicast.interface_id );
>       uint16_t mlid = cl_ntoh16( p_mcmr->mlid );
> -     ib_node_record_t *node_record = NULL;
>       int      i = 0;
> +     char *node_name = "<unknown>";
>  
> -     /* go through and find the node description for this node GID */
> +     /* go through the node records searching for a port guid which matches
> +      * this port gid interface id.
> +      * This gives us a node name to print, if available.
> +      */
>       for (i = 0; i < result.result_cnt; i++) {
> -             node_record = osmv_get_query_node_rec(result.p_result_madw, i);
> -             if (cl_ntoh64(node_record->node_info.port_guid) == 
> gid_interface_id)
> +             ib_node_record_t *nr = 
> osmv_get_query_node_rec(result.p_result_madw, i);
> +             if (cl_ntoh64(nr->node_info.port_guid) == gid_interface_id) {
> +                     node_name = clean_nodedesc((char 
> *)nr->node_desc.description);
>                       break;
> +             }
>       }
>  
>       if (requested_name) {
> @@ -370,7 +375,7 @@ print_multicast_member_record(ib_member_rec_t *p_mcmr)
>                              "0x%016" PRIx64 " (%s)\n",
>                              gid_prefix,
>                              gid_interface_id,
> -                            clean_nodedesc((char 
> *)node_record->node_desc.description)
> +                            node_name
>                             );
>               }
>       } else {
> @@ -391,7 +396,7 @@ print_multicast_member_record(ib_member_rec_t *p_mcmr)
>                      gid_interface_id,
>                      p_mcmr->scope_state,
>                      p_mcmr->proxy_join,
> -                    clean_nodedesc((char 
> *)node_record->node_desc.description)
> +                    node_name
>                     );
>       }
>  }

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to