Sean Hefty wrote:
Peer to peer connection was never fully implemented in the ib_cm. I don't think it would be that hard to implement at that level, and it shouldn't require API changes.
With you below comment of "CM needs to know the connection model selected by the app" I am somehow confused. With reading your other comments, I see two options here based on whether the implementation differentiate between peer-to-peer SIDs to client/server SIDs:
if there's no difference, then also in the peer-to-peer model, the application must first tell the CM to listen on a SID and its up to the CM to break the symmetry and decide who sends the REP and who ignores the REQ.
if there is a diff, then peer-to-peer SIDs are in a different domain then client/server SIDs.
Support at the rdma_cm level may require an API change. There's no easy way for the rdma_cm to know if it should invoke the IB peer-to-peer connection model. I'm not even sure how one peer would know the other peer's port number, unless well known ports are used on both sides.
Why there should be a difference between the rdma-cm to the cm? if in the cm you have a model without API change, wouldn't it apply also to the rdma-cm?
Such support would be useful in symmetric schemes such as MPIs that open connections on demand and more applications where each party can both accept and initiate connections. For example, I understand that some work is done now at the open mpi community to use the rdma-cm as a possible channel for connection establishment.
I would need to better understand the expected usage model, like how the peers find each other, but this is something that could be added if needed.
I think that in the MPI world each rank gets a SID from the local CM and they exchange the SIDs out-of-band, then connections are opened. If its a connection-on-demand scheme, then when ever the rank process calls mpi_send() to peer for which the local MPI library does not have a connection, it tries to connect. So if this happens "at once" between some pair of ranks, there should be a way to form one connection out of these two connecting requests. My thinking/motivation is that support of this scheme should be in the IB stack (cm and rdma-cm) level and not in the specific MPI implementation level.
Jeff, Jon, any comments? Or. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
