RDMA over Ethernet (RDMAoE) allows running the IB transport protocol using Ethernet frames, enabling the deployment of IB semantics on lossless Ethernet fabrics. RDMAoE packets are standard Ethernet frames with an IEEE assigned Ethertype, a GRH, unmodified IB transport headers and payload. IB subnet management and SA services are not required for RDMAoE operation; Ethernet management practices are used instead. RDMAoE encodes IP addresses into its GIDs and resolves MAC addresses using the host IP stack. For multicast GIDs, standard IP to MAC mappings apply.
To support RDMAoE, a new transport protocol was added to the IB core. An RDMA device can have ports with different transports, which are identified by a port transport attribute. The RDMA Verbs API is syntactically unmodified. When referring to RDMAoE ports, Address handles are required to contain GIDs while LID fields are ignored. The Ethernet L2 information is subsequently obtained by the vendor-specific driver (both in kernel- and user-space) while modifying QPs to RTR and creating address handles. As there is no SA in RDMAoE, the CMA code is modified to fill the necessary path record attributes locally before sending CM packets. Similarly, the CMA provides to the user the required address handle attributes when processing SIDR requests and joining multicast groups. In this patch set, an RDMAoE port is currently assigned a single GID, encoding the IPv6 link-local address of the corresponding netdev; the CMA RDMAoE code temporarily uses IPv6 link-local addresses as GIDs instead of the IP address provided by the user, thereby supporting any IP address. To enable RDMAoE with the mlx4 driver stack, both the mlx4_en and mlx4_ib drivers must be loaded, and the netdevice for the corresponding RDMAoE port must be running. Individual ports of a multi port HCA can be independently configured as Ethernet (with support for RDMAoE) or IB, as is already the case. We have successfully tested MPI, SDP, RDS, and native Verbs applications over RDMAoE. Following is a series of 10 patches based on version 2.6.30 of the Linux kernel. This new series reflects changes based on feedback from the community on the previous set of patches, and is tagged v5. Changes from v4: 1. Added rdma_is_transport_supported() and used it to simplify conditionals throughout the code. 2. ib_register_mad_agent()for QP0 is only called for IB ports 3. PATCH 5/10 changed from "Enable support for RDMAoE ports" to "Enable support only for IB ports". 4. MAD services from userspace currently not supported for RDMAoE ports. 5. Add kref to struct cma_multicast to aid in maintaining reference count on the object. This is to avoid freeing the object while the worker thread is still using it. 6. Return immediate error for invalid MTU when resolving an RDMAoE path 7. Don't fail resolve path if rate is 0 since this value stands for IB_RATE_PORT_CURRENT. 8. In cma_rdmaoe_join_multicast(), fail immediately if mtu is zero. 9. Add ucma_copy_rdmaoe_route()instead of modifying ucma_copy_ib_route(). 10. Bug fix: in PATCH 10/10, call flush_workqueue after unregistering netdev notifiers 11. Multicast no longer use the broadcast MAC. 12. No changes to patches 2, 7 and 8 from the v4 series. Signed-off-by: Eli Cohen <[email protected]> --- b/drivers/infiniband/core/agent.c | 38 ++- b/drivers/infiniband/core/cm.c | 25 +- b/drivers/infiniband/core/cma.c | 54 ++-- b/drivers/infiniband/core/mad.c | 41 ++- b/drivers/infiniband/core/multicast.c | 4 b/drivers/infiniband/core/sa_query.c | 39 ++- b/drivers/infiniband/core/ucm.c | 8 b/drivers/infiniband/core/ucma.c | 2 b/drivers/infiniband/core/ud_header.c | 111 ++++++++++ b/drivers/infiniband/core/user_mad.c | 6 b/drivers/infiniband/core/uverbs.h | 1 b/drivers/infiniband/core/uverbs_cmd.c | 32 ++ b/drivers/infiniband/core/uverbs_main.c | 1 b/drivers/infiniband/core/verbs.c | 25 ++ b/drivers/infiniband/hw/mlx4/ah.c | 187 +++++++++++++--- b/drivers/infiniband/hw/mlx4/mad.c | 32 +- b/drivers/infiniband/hw/mlx4/main.c | 309 +++++++++++++++++++++++++--- b/drivers/infiniband/hw/mlx4/mlx4_ib.h | 19 + b/drivers/infiniband/hw/mlx4/qp.c | 172 ++++++++++----- b/drivers/infiniband/ulp/ipoib/ipoib_main.c | 12 - b/drivers/net/mlx4/en_main.c | 15 + b/drivers/net/mlx4/en_port.c | 4 b/drivers/net/mlx4/en_port.h | 3 b/drivers/net/mlx4/fw.c | 3 b/drivers/net/mlx4/intf.c | 20 + b/drivers/net/mlx4/main.c | 6 b/drivers/net/mlx4/mlx4.h | 1 b/include/linux/mlx4/cmd.h | 1 b/include/linux/mlx4/device.h | 31 ++ b/include/linux/mlx4/driver.h | 16 + b/include/linux/mlx4/qp.h | 8 b/include/rdma/ib_addr.h | 92 ++++++++ b/include/rdma/ib_pack.h | 26 ++ b/include/rdma/ib_user_verbs.h | 21 + b/include/rdma/ib_verbs.h | 11 b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 b/net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 drivers/infiniband/core/cm.c | 5 drivers/infiniband/core/cma.c | 207 ++++++++++++++++++ drivers/infiniband/core/mad.c | 37 ++- drivers/infiniband/core/ucm.c | 12 - drivers/infiniband/core/ucma.c | 31 ++ drivers/infiniband/core/user_mad.c | 15 - drivers/infiniband/core/verbs.c | 10 include/rdma/ib_verbs.h | 15 + 45 files changed, 1440 insertions(+), 273 deletions(-) _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
