SO in a nutshell the proposal is to add some identifier into "CM private data" which indicate that it is peer-to-peer model, and unique peers IDs for the requested connection.
Is this the model? Thanks, Arkady Kanevsky email: [EMAIL PROTECTED] Network Appliance Inc. phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 > -----Original Message----- > From: Or Gerlitz [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 20, 2007 10:09 AM > To: Sean Hefty > Cc: OpenFabrics General > Subject: Re: [ofa-general] peer to peer connections support > > Sean Hefty wrote: > ... > > I didn't follow this. > ... > > Peer to peer SIDs are in a different domain than > client/server SIDs, > > and the peer_to_peer field is used to indicate which domain > a SID is in. > > Sorry if I wasn't clear, let me see if I understand you: with > this different domain implementation, under both > client/server the passive calls cm listen and the active call > cm connect, where under peer/to/peer both sides call cm > listen and later both sides may call cm connect or only one > side, correct? > > > To add to my comments on the CM API, struct > ib_cm_req_param, which is > > used to send the REQ, includes service_id and peer_to_peer fields. > > The latter is a boolean used by the CM to distinguish if > incoming REQs > > can be matched with the outgoing REQ. > > OK, this makes things clearer. > > >> Why there should be a difference between the rdma-cm to > the cm? if in > >> the cm you have a model without API change, wouldn't it > apply also to > >> the rdma-cm? > > > The rdma_cm does not know how to set the peer_to_peer field in the > > ib_cm_req_param. It sets this field to 0 today. > > But it could set it to one as well... assuming my > understanding above of the suggested implementation is > correct, we can change the RDMA-CM API to let users specify > on rdma_connect that they want peer to peer support, so such > apps can issue rdma_listen call and later call rdma_connect > with this bit set and they are done (or almost done... I > guess there some more devil in the details here, isn't it?) > > > > I think that in the MPI world each rank gets a SID from > the local > > CM and > they exchange the SIDs out-of-band, then connections are > > opened. If its > a connection-on-demand scheme, then when ever the > > rank process calls > mpi_send() to peer for which the local MPI > > library does not have a > connection, it tries to connect. > So if this > > happens "at once" between > some pair of ranks, there > should be a way > > to form one connection out of > these two connecting requests. My > > thinking/motivation is that support of > this scheme > should be in the > > IB stack (cm and rdma-cm) level and not in > the specific > MPI implementation level. > > > > Are the out of band connections used by MPI formed using > client/server > > or peer to peer? I believe that Intel MPI has each rank listen for > > connections from the ranks below it using client/server. > > yes, MPIs that do all-to-all-connect on job start, typically > use client/server where all the ranks > 0 issue listen call > and then all lower ranks connect to higher ranks or etc some > other symmetry breaking scheme. I am trying to see what needs > to be supported by the IB stack to let MPIs that do connect > on demand use the RDMA-CM. > > > There are a couple of problems with the peer to peer model. First, > > unless the connections occur at exactly the same time, they miss > > connecting (rejected with invalid SID). > > This makes the all peer to peer model useless, since an app > can not make sure that connection occur at exactly the same > time! my understanding of the spec is that peer to peer model > has the ability to handle also connections that occur at > exactly the same time but not only. > > > Second, if multiple peer to > > peer connections need to form between the same pair of > nodes, things > > can go screwy (that's the technical term) trying to match > up the peer requests. > > Under MPI each rank uses a different SID, so I think we are > safe from this problem. > > Or > > > > > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
