We've used RDMA via RoCEv2 on 100GbE.  It ran in production that way for at 
least 6 months before I had to turn it off when doing some migrations using 
hardware that didn't support it.  We noticed no performance change in our 
environment so once we were done I just never turned it back on.  I'm not even 
sure we could right now with how we have our network topology / bond interfaces

The biggest annoyance was making sure the device name and gid were correct.  
This was before the ceph config stuff existed so it may be easier now to roll 
that one out.

Example config section for one of my nodes (in the global part under 
public+cluster network):

ms_cluster_type = async+rdma
ms_async_rdma_device_name = mlx5_1
ms_async_rdma_polling_us = 0
ms_async_rdma_local_gid = 0000:0000:0000:0000:0000:ffff:c1b8:4fa0
ms_async_rdma_roce_ver = 1

We pulled the GID in ansible with:

- name: "Insert RDMA GID into ceph.conf"
  shell: sed -i s/GIDGOESHERE/$(cat 
/sys/class/infiniband/mlx5_1/ports/1/gids/5)/g /etc/ceph/ceph.conf
  args:
    warn: no 

The stub config file we pushed had "GIDGOESHERE" in it.

I hope that helps someone out there.  Not all of the settings were obvious and 
it took some trial and error.   Now that we have a pure NVMe tier I'll probably 
try and turn it back on to see if we notice any changes.

Netdata also proved to be a valuable tool to make sure we had traffic in both 
TCP and RDMA
https://www.netdata.cloud/


--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.
------------------------

________________________________________
From: Andrei Mikhailovsky <and...@arhont.com>
Sent: Wednesday, August 26, 2020 5:55 PM
To: Rafael Quaglio
Cc: ceph-users
Subject: [ceph-users] Re: Infiniband support

Rafael, We've been using ceph with ipoib for over 7 years and it's been 
supported. However, I am not too sure of the the native rdma support. There has 
been discussions on/off for a while now, but I've not seen much. Perhaps others 
know.

Cheers

> From: "Rafael Quaglio" <quag...@bol.com.br>
> To: "ceph-users" <ceph-users@ceph.io>
> Sent: Wednesday, 26 August, 2020 14:08:57
> Subject: [ceph-users] Infiniband support

> Hi,
> I could not see in the doc if Ceph has infiniband support. Is there someone
> using it?
> Also, is there any rdma support working natively?

> Can anyoune point me where to find more information about it?

> Thanks,
> Rafael.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to