Hello all, I was considering contributing native support for InfiniBand and other RDMA-enabled technologies to 0MQ and wished to know if there was some interest in it first. I have played a little bit with the project and found it very interesting, and I think that native support for RDMA-enabled technologies would be a nice addition to it.
A couple of words first for those not familiar with these technologies. Using the native InfiniBand libraries (verbs) on fairly modern gear it is possible to achieve application-to-application loaded latencies in the 1-2us range for small packets, even across 2-3 switches with jitter usually being in the 100s of nanoseconds range. Extremely high message-rates can also be achieved with surprisingly little CPU usage, I've personally measured 24 million / packets over QDR InfiniBand and it is probably possible to push it as high as 30-40 (with very small messages). Similar though not as spectacular results can be obtained over Ethernet networks using RDMA-enabled NICs which can be either iWARP or RoCE, the big plus being that with some care it is possible to run exactly the same code on all of these different architectures. I've spent some time looking at the 0MQ internals and though I still do not grasp all of its nuances I think that introducing verbs support in it should not be too difficult. Here's a quick rundown on how I would tackle this project (I may have misunderstood some parts of the 0MQ internals so please correct me if I'm wrong): - I would introduce a new transport using the Reliable Connection (RC) protocol, this is similar to TCP in that it guarantees lossless, ordered message delivery though contrary to TCP it delivers messages as datagrams - Connection management would be done using the RDMA connection manager library, this has semantics and an interface which resembles very closely TCP sockets and provides file descriptors for notification; establishing connections wouldn't look too different from what happens in tcp_listener_t/tcp_connecter_t - Data transfer would be done with raw verbs calls, this would make use of a custom sender/receiver object, something akin to stream_engine_t. The ibverbs library supports file descriptors for signaling data transfer events so this should also fit well within the 0MQ model - The code would mainly go into separate files plus some changes to socket_base.cpp and session_base.cpp to support the new transport, relevant options and checks would be added to the autoconf infrastructure - The project would be pretty much Linux-only though Windows support might be introduced at a later stage, FreeBSD support might come along when they finish porting the OFED stack to their kernel - Looking at the contribution page it seems to me that the best model for such a project would be to fork the mainline on GitHub and ask for pull requests if/when there will be interest in integrating it in the main codebase; I am also under the impression that it would not be too hard to backport the changes to version 2.x at a later stage So my question is: before I start working on it, is there any interest for this? I would do it mostly for fun and because I find RDMA networking technologies very powerful tools that are unfortunately fairly hard to use for the non-initiated and I feel that adding support in 0MQ would make them more accessible to a wider public. Gabriele Svelto _______________________________________________ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev