RDMA over Ethernet (RDMAoE) allows running the IB transport protocol using Ethernet frames allowing the deployment of IB semantics on lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames with an IEEE assigned Ethertype, a GRH, unmodified IB transport headers and payload. Aside from the considerations pointed out below, RDMAoE ports are functionally equivalent to regular IB ports from the RDMA stack perspective.
IB subnet management and SA services are not required for RDMAoE operation; Ethernet management practices are used instead. In Ethernet, nodes are commonly referred to by applications by means of an IP address. RDMAoE encodes the IP addresses that were assigned to the corresponding Ethernet port into its GIDs, and makes use of the IP stack to bind a destination address to the corresponding netdevice (just as the CMA does today for IB and iWARP) and to obtain its L2 MAC addresses. The RDMA Verbs API is syntactically unmodified. When referring to RDMAoE ports, Address handles are required to contain GIDs and the L2 address fields in the API are ignored. The Ethernet L2 information is then obtained by the vendor-specific driver (both in kernel- and user-space) while modifying QPs to RTR and creating address handles. In order to maximize transparency for applications, RDMAoE implements a dedicated API that provides services equivalent to some of those provided by the IB-SA. The current approach is strictly local but may evolve in the future. This API is implemented using an independent source code file which allows for seamless evolution of the code without affecting the IB native SA interfaces. We have successfully tested MPI, SDP, RDS, and native Verbs applications over RDMAoE. To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and mlx4_ib drivers must be loaded, and the netdevice for the corresponding RDMAoE port must be running. Individual ports of a multi port HCA can be independently configured as Ethernet (with support for RDMAoE) or IB, as is already the case. Following is a series of 8 patches based on version 2.6.30 of the Linux kernel. This new series reflects changes based on feedback from the community on the previous set of patches. The whole series is tagged v3. Signed-off-by: Eli Cohen <e...@mellanox.co.il> drivers/infiniband/core/Makefile | 2 drivers/infiniband/core/addr.c | 20 drivers/infiniband/core/agent.c | 12 drivers/infiniband/core/cma.c | 124 +++ drivers/infiniband/core/mad.c | 48 + drivers/infiniband/core/multicast.c | 43 - drivers/infiniband/core/multicast.h | 79 ++ drivers/infiniband/core/rdmaoe_sa.c | 942 ++++++++++++++++++++++++++++++ drivers/infiniband/core/sa.h | 24 drivers/infiniband/core/sa_query.c | 26 drivers/infiniband/core/ud_header.c | 111 +++ drivers/infiniband/core/uverbs.h | 1 drivers/infiniband/core/uverbs_cmd.c | 33 + drivers/infiniband/core/uverbs_main.c | 1 drivers/infiniband/core/verbs.c | 17 drivers/infiniband/hw/mlx4/ah.c | 228 ++++++- drivers/infiniband/hw/mlx4/main.c | 276 +++++++- drivers/infiniband/hw/mlx4/mlx4_ib.h | 30 drivers/infiniband/hw/mlx4/qp.c | 253 ++++++-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 3 drivers/net/mlx4/cmd.c | 6 drivers/net/mlx4/en_main.c | 15 drivers/net/mlx4/en_port.c | 4 drivers/net/mlx4/en_port.h | 3 drivers/net/mlx4/intf.c | 20 drivers/net/mlx4/main.c | 6 drivers/net/mlx4/mlx4.h | 1 include/linux/mlx4/cmd.h | 1 include/linux/mlx4/device.h | 31 include/linux/mlx4/driver.h | 16 include/linux/mlx4/qp.h | 8 include/rdma/ib_addr.h | 53 + include/rdma/ib_pack.h | 26 include/rdma/ib_user_verbs.h | 21 include/rdma/ib_verbs.h | 22 include/rdma/rdmaoe_sa.h | 66 ++ 36 files changed, 2333 insertions(+), 239 deletions(-) _______________________________________________ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg