Hello all,
I was considering contributing native support for InfiniBand and other
RDMA-enabled technologies to 0MQ and wished to know if there was some
interest in it first. I have played a little bit with the project and
found it very interesting, and I think that native support for
RDMA-enabled technologies would be a nice addition to it.

A couple of words first for those not familiar with these
technologies. Using the native InfiniBand libraries (verbs) on fairly
modern gear it is possible to achieve application-to-application
loaded latencies in the 1-2us range for small packets, even across 2-3
switches with jitter usually being in the 100s of nanoseconds range.
Extremely high message-rates can also be achieved with surprisingly
little CPU usage, I've personally measured 24 million / packets over
QDR InfiniBand and it is probably possible to push it as high as 30-40
(with very small messages). Similar though not as spectacular results
can be obtained over Ethernet networks using RDMA-enabled NICs which
can be either iWARP or RoCE, the big plus being that with some care it
is possible to run exactly the same code on all of these different
architectures.

I've spent some time looking at the 0MQ internals and though I still
do not grasp all of its nuances I think that introducing verbs support
in it should not be too difficult. Here's a quick rundown on how I
would tackle this project (I may have misunderstood some parts of the
0MQ internals so please correct me if I'm wrong):
- I would introduce a new transport using the Reliable Connection (RC)
protocol, this is similar to TCP in that it guarantees lossless,
ordered message delivery though contrary to TCP it delivers messages
as datagrams
- Connection management would be done using the RDMA connection
manager library, this has semantics and an interface which resembles
very closely TCP sockets and provides file descriptors for
notification; establishing connections wouldn't look too different
from what happens in tcp_listener_t/tcp_connecter_t
- Data transfer would be done with raw verbs calls, this would make
use of a custom sender/receiver object, something akin to
stream_engine_t. The ibverbs library supports file descriptors for
signaling data transfer events so this should also fit well within the
0MQ model
- The code would mainly go into separate files plus some changes to
socket_base.cpp and session_base.cpp to support the new transport,
relevant options and checks would be added to the autoconf
infrastructure
- The project would be pretty much Linux-only though Windows support
might be introduced at a later stage, FreeBSD support might come along
when they finish porting the OFED stack to their kernel
- Looking at the contribution page it seems to me that the best model
for such a project would be to fork the mainline on GitHub and ask for
pull requests if/when there will be interest in integrating it in the
main codebase; I am also under the impression that it would not be too
hard to backport the changes to version 2.x at a later stage

So my question is: before I start working on it, is there any interest
for this? I would do it mostly for fun and because I find RDMA
networking technologies very powerful tools that are unfortunately
fairly hard to use for the non-initiated and I feel that adding
support in 0MQ would make them more accessible to a wider public.

 Gabriele Svelto
_______________________________________________
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to