On Fri, 2008-09-26 at 10:54 -0500, Scott M. Ferris wrote: > On Wed, Sep 24, 2008 at 03:43:56PM -0700, Steven Dake wrote: > > > > Totem is a reliable virtual synchrony multicast protocol which transmits > > a message from any node to all nodes in a collection of computers > > (called the configuration or membership). It has a few requirements: > > > > unreliable datagram multicast > > unreliable datagram unicast > > ability to bind to a specific port and interface > > ability to poll() (POLLIN) via system call for new multicast datagram > > messages > > It's been a few years since I read the totem paper, but If I recall > correctly, totem also has certain ordering requirements about unicast > and multicast, which the paper asserts are true for ethernet. >
Totem can deal with reordering of any packets and has no otherrequirements then above. It is designed for these sorts of networks. Thanks for your detailed response. I have a small question below... > It's not clear to me that Infiniband will provide the same guarantees > in all cases. If unicasts and multicasts are sent on different queue > pairs, I'm not sure any ordering is guaranteed. I can also imagine > the IB virtual lane (VL) feature potentially reordering delivery if > messages end up in different VLs. I'd recommend talking to someone > more knowledgable in Infiniband than I am to check what you need to do > to meet the totem ordering requirement. > > > Few questions: > > 1) I would like to continue to use IP addressing but it looks like I > > have to use a different addressing model in librdmacm. I looked at the > > examples in the library and it isn't clear to me whether they use IP > > addressing or some other addressing model. I see references to IPoverIB > > but I don't see any information in the wiki on the topic. Anyone have > > links to documentation on the topic of node addressing? > > IPoIB is fairly transparent to you. The OFED software provides Linux > netdevices (e.g. ib0, ib1) which pass IP traffic just like an ethernet > netdevice would. Normal IP routing controls what interface gets used. > For the most part IP-based software just works. Low-level things like > DHCP daemons will notice differences, since the hardware addresses are > larger. > > If IPoIB connected mode is being used, unicasts and multicasts will be > sent on different queue pairs, since the unicasts will use a > connection, and the multicasts can't. You may need to disable IPoIB > connected mode in order to get the ordering guarantees totem needs, > since I'm not sure any ordering is guaranteed between messages queued > to different queue pairs. > > If you want to use native IB protocols instead of IPoIB, librdmacm > will be easier than using libibcm. The RDMA communication manager > uses IPoIB to resolve IP addresses to native IB addresses, so you can > avoid changing your addressing model. The rdma_cm(7) man page > describes how to bind and connect. > > > 2) The library doesn't have any non blocking (kernel wait queue based) > > polling mechanism that I can see. Am I missing a call here? > > Look at ibv_create_comp_channel(3) and ibv_get_cq_event(3). > > > 3) Of course using the standard socket API would be highly desired as > > it requires less code changes. Is there some other library I should be > > using? > > IPoIB gives you the standard socket API. > Where is IPoIB? Is it a standard feature of the OFED software stack? Thanks again! regards -steve > RDMA CM uses a similiar set of calls for binding and connecting, > though there are some differences. rdma_cm_ids are somewhat like > socket fds. Once you have a multicast or unicast rmda_cm_id, you use > the IB verbs API (libibverbs) to send and receive. > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
