Hi all:
Has anyone successfully set up ceph with rdma over IB ?
By following the instructions:
(https://community.mellanox.com/docs/DOC-2721)
(https://community.mellanox.com/docs/DOC-2693)
(http://hwchiu.com/2017-05-03-ceph-with-rdma.html)
I'm trying to configure CEPH with RDMA feature on environments as follows:
CentOS Linux release 7.2.1511 (Core)
MLNX_OFED_LINUX-4.4-1.0.0.0:
Mellanox Technologies MT27500 Family [ConnectX-3]
rping works between all nodes and add these lines to ceph.conf to enable
RDMA:
public_network = 10.10.121.0/24
cluster_network = 10.10.121.0/24
ms_type = async+rdma
ms_async_rdma_device_name = mlx4_0
ms_async_rdma_port_num = 2
IB network is using 10.10.121.0/24 addresses and "ibdev2netdev" command
shows port 2 is up.
Error occurs when running "ceph-deploy --overwrite-conf mon
create-initial", ceph-deploy log details:
[2018-07-12 17:53:48,943][ceph_deploy.conf][DEBUG ] found configuration
file at: /home/user1/.cephdeploy.conf
[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] Invoked (1.5.37):
/usr/bin/ceph-deploy --overwrite-conf mon create-initial
[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] ceph-deploy options:
[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ]
username : None
[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ]
verbose : False
[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ]
overwrite_conf : True
[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ]
subcommand : create-initial
[2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] quiet
: False
[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ]
cd_conf : <ceph_deploy.conf.cephdeploy.Conf object at
0x27e6210>
[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ]
cluster : ceph
[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ]
func : <function mon at 0x2a7d2a8>
[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ]
ceph_conf : None
[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ]
default_release : False
[2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ]
keyrings : None
[2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] Deploying mon, cluster
ceph hosts node1
[2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] detecting platform for
host node1 ...
[2018-07-12 17:53:49,005][node1][DEBUG ] connection detected need for sudo
[2018-07-12 17:53:49,039][node1][DEBUG ] connected to host: node1
[2018-07-12 17:53:49,040][node1][DEBUG ] detect platform information from
remote host
[2018-07-12 17:53:49,073][node1][DEBUG ] detect machine type
[2018-07-12 17:53:49,078][node1][DEBUG ] find the location of an executable
[2018-07-12 17:53:49,079][ceph_deploy.mon][INFO ] distro info: CentOS
Linux 7.2.1511 Core
[2018-07-12 17:53:49,079][node1][DEBUG ] determining if provided host has
same hostname in remote
[2018-07-12 17:53:49,079][node1][DEBUG ] get remote short hostname
[2018-07-12 17:53:49,080][node1][DEBUG ] deploying mon to node1
[2018-07-12 17:53:49,080][node1][DEBUG ] get remote short hostname
[2018-07-12 17:53:49,081][node1][DEBUG ] remote hostname: node1
[2018-07-12 17:53:49,083][node1][DEBUG ] write cluster configuration to
/etc/ceph/{cluster}.conf
[2018-07-12 17:53:49,084][node1][DEBUG ] create the mon path if it does not
exist
[2018-07-12 17:53:49,085][node1][DEBUG ] checking for done path:
/var/lib/ceph/mon/ceph-node1/done
[2018-07-12 17:53:49,085][node1][DEBUG ] create a done file to avoid
re-doing the mon deployment
[2018-07-12 17:53:49,086][node1][DEBUG ] create the init path if it does
not exist
[2018-07-12 17:53:49,089][node1][INFO ] Running command: sudo systemctl
enable ceph.target
[2018-07-12 17:53:49,365][node1][INFO ] Running command: sudo systemctl
enable ceph-mon@node1
[2018-07-12 17:53:49,588][node1][INFO ] Running command: sudo systemctl
start ceph-mon@node1
[2018-07-12 17:53:51,762][node1][INFO ] Running command: sudo ceph
--cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status
[2018-07-12 17:53:51,979][node1][DEBUG ]
********************************************************************************
[2018-07-12 17:53:51,979][node1][DEBUG ] status for monitor: mon.node1
[2018-07-12 17:53:51,980][node1][DEBUG ] {
[2018-07-12 17:53:51,980][node1][DEBUG ] "election_epoch": 3,
[2018-07-12 17:53:51,980][node1][DEBUG ] "extra_probe_peers": [],
[2018-07-12 17:53:51,980][node1][DEBUG ] "feature_map": {
[2018-07-12 17:53:51,981][node1][DEBUG ] "mon": {
[2018-07-12 17:53:51,981][node1][DEBUG ] "group": {
[2018-07-12 17:53:51,981][node1][DEBUG ] "features":
"0x1ffddff8eea4fffb",
[2018-07-12 17:53:51,981][node1][DEBUG ] "num": 1,
[2018-07-12 17:53:51,981][node1][DEBUG ] "release": "luminous"
[2018-07-12 17:53:51,981][node1][DEBUG ] }
[2018-07-12 17:53:51,981][node1][DEBUG ] }
[2018-07-12 17:53:51,982][node1][DEBUG ] },
[2018-07-12 17:53:51,982][node1][DEBUG ] "features": {
[2018-07-12 17:53:51,982][node1][DEBUG ] "quorum_con":
"2305244844532236283",
[2018-07-12 17:53:51,982][node1][DEBUG ] "quorum_mon": [
[2018-07-12 17:53:51,982][node1][DEBUG ] "kraken",
[2018-07-12 17:53:51,982][node1][DEBUG ] "luminous"
[2018-07-12 17:53:51,982][node1][DEBUG ] ],
[2018-07-12 17:53:51,982][node1][DEBUG ] "required_con":
"153140804152475648",
[2018-07-12 17:53:51,983][node1][DEBUG ] "required_mon": [
[2018-07-12 17:53:51,983][node1][DEBUG ] "kraken",
[2018-07-12 17:53:51,983][node1][DEBUG ] "luminous"
[2018-07-12 17:53:51,983][node1][DEBUG ] ]
[2018-07-12 17:53:51,983][node1][DEBUG ] },
[2018-07-12 17:53:51,983][node1][DEBUG ] "monmap": {
[2018-07-12 17:53:51,983][node1][DEBUG ] "created": "2018-07-12
17:41:24.243749",
[2018-07-12 17:53:51,984][node1][DEBUG ] "epoch": 1,
[2018-07-12 17:53:51,984][node1][DEBUG ] "features": {
[2018-07-12 17:53:51,984][node1][DEBUG ] "optional": [],
[2018-07-12 17:53:51,984][node1][DEBUG ] "persistent": [
[2018-07-12 17:53:51,984][node1][DEBUG ] "kraken",
[2018-07-12 17:53:51,984][node1][DEBUG ] "luminous"
[2018-07-12 17:53:51,984][node1][DEBUG ] ]
[2018-07-12 17:53:51,984][node1][DEBUG ] },
[2018-07-12 17:53:51,985][node1][DEBUG ] "fsid":
"9317bc6a-ea20-4376-a390-52afa0b81353",
[2018-07-12 17:53:51,985][node1][DEBUG ] "modified": "2018-07-12
17:41:24.243749",
[2018-07-12 17:53:51,985][node1][DEBUG ] "mons": [
[2018-07-12 17:53:51,985][node1][DEBUG ] {
[2018-07-12 17:53:51,985][node1][DEBUG ] "addr": "
10.10.121.25:6789/0",
[2018-07-12 17:53:51,985][node1][DEBUG ] "name": "node1",
[2018-07-12 17:53:51,985][node1][DEBUG ] "public_addr": "
10.10.121.25:6789/0",
[2018-07-12 17:53:51,986][node1][DEBUG ] "rank": 0
[2018-07-12 17:53:51,986][node1][DEBUG ] }
[2018-07-12 17:53:51,986][node1][DEBUG ] ]
[2018-07-12 17:53:51,986][node1][DEBUG ] },
[2018-07-12 17:53:51,986][node1][DEBUG ] "name": "node1",
[2018-07-12 17:53:51,986][node1][DEBUG ] "outside_quorum": [],
[2018-07-12 17:53:51,986][node1][DEBUG ] "quorum": [
[2018-07-12 17:53:51,986][node1][DEBUG ] 0
[2018-07-12 17:53:51,987][node1][DEBUG ] ],
[2018-07-12 17:53:51,987][node1][DEBUG ] "rank": 0,
[2018-07-12 17:53:51,987][node1][DEBUG ] "state": "leader",
[2018-07-12 17:53:51,987][node1][DEBUG ] "sync_provider": []
[2018-07-12 17:53:51,987][node1][DEBUG ] }
[2018-07-12 17:53:51,987][node1][DEBUG ]
********************************************************************************
[2018-07-12 17:53:51,987][node1][INFO ] monitor: mon.node1 is running
[2018-07-12 17:53:51,989][node1][INFO ] Running command: sudo ceph
--cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status
[2018-07-12 17:53:52,156][ceph_deploy.mon][INFO ] processing monitor
mon.node1
[2018-07-12 17:53:52,194][node1][DEBUG ] connection detected need for sudo
[2018-07-12 17:53:52,230][node1][DEBUG ] connected to host: node1
[2018-07-12 17:53:52,231][node1][DEBUG ] detect platform information from
remote host
[2018-07-12 17:53:52,265][node1][DEBUG ] detect machine type
[2018-07-12 17:53:52,270][node1][DEBUG ] find the location of an executable
[2018-07-12 17:53:52,273][node1][INFO ] Running command: sudo ceph
--cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status
[2018-07-12 17:53:52,439][ceph_deploy.mon][INFO ] mon.node1 monitor has
reached quorum!
[2018-07-12 17:53:52,440][ceph_deploy.mon][INFO ] all initial monitors are
running and have formed quorum
[2018-07-12 17:53:52,440][ceph_deploy.mon][INFO ] Running gatherkeys...
[2018-07-12 17:53:52,441][ceph_deploy.gatherkeys][INFO ] Storing keys in
temp directory /tmp/tmp8bdYT6
[2018-07-12 17:53:52,477][node1][DEBUG ] connection detected need for sudo
[2018-07-12 17:53:52,510][node1][DEBUG ] connected to host: node1
[2018-07-12 17:53:52,511][node1][DEBUG ] detect platform information from
remote host
[2018-07-12 17:53:52,552][node1][DEBUG ] detect machine type
[2018-07-12 17:53:52,558][node1][DEBUG ] get remote short hostname
[2018-07-12 17:53:52,559][node1][DEBUG ] fetch remote file
[2018-07-12 17:53:52,562][node1][INFO ] Running command: sudo
/usr/bin/ceph --connect-timeout=25 --cluster=ceph
--admin-daemon=/var/run/ceph/ceph-mon.node1.asok mon_status
[2018-07-12 17:53:52,731][node1][INFO ] Running command: sudo
/usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon.
--keyring=/var/lib/ceph/mon/ceph-node1/keyring auth get client.admin
[2018-07-12 17:54:18,059][node1][ERROR ] "ceph auth get-or-create for
keytype admin returned 1
[2018-07-12 17:54:18,059][node1][DEBUG ] Cluster connection interrupted or
timed out
[2018-07-12 17:54:18,059][node1][ERROR ] Failed to return 'admin' key from
host node1
[2018-07-12 17:54:18,059][ceph_deploy.gatherkeys][ERROR ] Failed to connect
to host:node1
[2018-07-12 17:54:18,060][ceph_deploy.gatherkeys][INFO ] Destroy temp
directory /tmp/tmp8bdYT6
[2018-07-12 17:54:18,060][ceph_deploy][ERROR ] RuntimeError: Failed to
connect any mon
ceph-mon service is up but cannot be connected to reach, "ceph -s" also
returns same types of error:
2018-07-13 10:44:21.169536 7fa570d4e700 0 monclient(hunting): authenticate
timed out after 300
2018-07-13 10:44:21.169579 7fa570d4e700 0 librados: client.admin
authentication error (110) Connection timed out
[errno 110] error connecting to the cluster
I'am running the ceph version 12.2.4 luminous stable, do you guys have any
suggestion about this issue?
Thx
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com