On Fri, 7 May 2010 13:50:53 -0700
Sasha Khapyorsky <[email protected]> wrote:

> Hi Ira,
> 
> On 19:12 Thu 22 Apr     , Ira Weiny wrote:
> > On Tue, 13 Apr 2010 16:25:31 +0300
> > Sasha Khapyorsky <[email protected]> wrote:
> > 
> > > On 13:46 Tue 13 Apr     , Sasha Khapyorsky wrote:
> > > > 
> > > > However see some comments and questions below.
> > > 
> > > Another thought. What about API like:
> > > 
> > >   ibnd_discover_fabric(cosnt char *ca_name, unsigned port_num,
> > >                        struct ibnd_config *cfg);
> > > 
> > > So libibnetdisc will be responsible for opening and closing its own port
> > > and in this way will be fully reenterable?
> > 
> > I think we need this...  But this alone will not fix ibnetdisc...
> > 
> > I found out why "iblinkinfo -S" is hanging for me.  The new algorithm has 
> > the
> > library using both libibmad and libibumad calls simultaneously.
> > 
> > These libraries were not designed to be used this way.  Therefore, I don't
> > think there is any direct bug in those layers.  However, this is why I 
> > thought
> > it was better for ibnetdisc to sit on top of ibmad and not use the umad 
> > layer.
> > 
> > Anyway, I think it is a mistake to pass an ibmad_port construct directly to
> > this library now.  Here is why.
> > 
> > umad_recv in query_smp.c and ib_resolve_self_via in ibnetdisc.c conflicted.
> > The underlying call to _do_madrpc was discarding the response MAD which
> > query_smp.c:umad_recv was expecting.
> 
> This makes sense for me.
> 
> > If we change the API to specify the ca name and port then the library can 
> > open
> > 2 ports (or as many as it wants) and use them appropriately.  I think this 
> > is
> > the only solution which does not involve fixing libibmad.
> > 
> > So what about something like this:
> > 
> >    int ibnd_discover_fabric(ibnd_fabric_t **fabric,
> >                         cosnt char *ca_name,  <== could we even default 
> > this?
> 
> I would think about ca_name and port_number. And this is of course may
> have default (NULL, 0).

Ok, ca_name and ca_port will be explicit.

> 
> >                         struct ibnd_config *cfg);
> 
> What is wrong with current ibdn_fabric_t *ibnd_discover_fabric(...)? Why
> do we need an extra parameter?

Well we are breaking the interface again so I figure we might as well clean 
some things up.  Returning an int allows us to have a reason for the failure 
returned to the caller rather than just "NULL".  We have cleaned up most of the 
internals of the library to allow for this.

> 
> > 
> > I don't mind the ibnd_config_t struct but I don't think it should be visible
> > to the user.  Make it opaque and use "set" functions.  Something like.
> > 
> > ibnd_fabric_t *fabric;
> > ibnd_config_t cfg;
> > ib_portid_t * from;
> > 
> > ibnd_set_hops(&cfg, hops);         <== default -1
> > ibnd_set_port_num(&cfg, port_num); <== default 1
> > ibnd_set_max_smps(&cfg, max_smps); <== default 2
> > ibnd_set_from_node(&cfg, from);    <== default NULL
> 
> I would prefer to not complicate API with ibnd_set_this() helpers. It
> would be necessary to add new ones in the future which will lead to API
> changes.

See below.

> 
> > if (ibnd_discover_fabric(&fabric, "foo", &cfg)) {  <== anything not in cfg 
> > is
> >                                                        defaulted here
> >    fprintf(stderr, "Wow it failed\n");
> > }
> > 
> > This allows us to change ibnd_config structure any time we want without
> > affecting the API.  I don't think the "pad" you used is a good idea.
> 
> Without padding we will break ABI each time when new field is added to
> the config structure.

No it does not iff you use the ibnd_set_this() helpers and make the config 
private.

> 
> > Also since we are breaking the API we might as well return the fabric as a
> > parameter and have an error code.  But I could go either way on this one.
> > 
> > Ira
> > 
> > 
> > [*] query_smp.c probably should have it's own timeout here but we can 
> > discuss
> > later.
> > 
> > [#] What sucks about this is that libibmad already has the functionality to
> > open the umad port and configure it (50 line function).  Now we will be
> > duplicating this functionality.
> 
> I think you can use mad_rpc_open_port(ca_name, port_number, ...) just
> fine (and so the rest of libibmad stuff) - it will open separate fd.

Yes for the libibmad functionality we can do that.  I was speaking of the use 
of the umad layer.  To use that layer we have to duplicate the functionality of 
mad_rpc_open_port in every tool which requires umad layer access, correct?  
Right now we are mixing the 2 layers (mad and umad) in saquery 
(get_bind_handle) as well as libibnetdisc.

I think we need to be careful we don't do this again!

Ira

> 
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://*vger.kernel.org/majordomo-info.html
> 


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
[email protected]
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to