Hello,
While testing our ceph cluster setup, I noticed a possible issue with the
cluster/public network configuration being ignored for TCP session
initiation.
Looks like the daemons (mon/mgr/mds/osd) are all listening on the right IP
address but are initiating TCP sessions from the wrong interfaces.
Would it be possible to force ceph daemons to use the cluster/public IP
addresses to initiate new TCP connections instead of letting the kernel
chose?
Some details below:
We set everything up to use our "10.2.1.0/24" network:
10.2.1.x (x=node number 1,2,3)
But we can see TCP sessions being initiated from "10.2.0.0/24" network.
So the daemons are listening to the right IP addresses.
root@nbs-vp-01:~# lsof -nPK i | grep ceph | grep LISTE
ceph-mds 1541648 ceph 16u IPv4 8169344
0t0 TCP 10.2.1.1:6800 (LISTEN)
ceph-mds 1541648 ceph 17u IPv4 8169346
0t0 TCP 10.2.1.1:6801 (LISTEN)
ceph-mgr 1541654 ceph 25u IPv4 8163039
0t0 TCP 10.2.1.1:6810 (LISTEN)
ceph-mgr 1541654 ceph 27u IPv4 8163051
0t0 TCP 10.2.1.1:6811 (LISTEN)
ceph-mon 1541703 ceph 27u IPv4 8170914
0t0 TCP 10.2.1.1:3300 (LISTEN)
ceph-mon 1541703 ceph 28u IPv4 8170915
0t0 TCP 10.2.1.1:6789 (LISTEN)
ceph-osd 1541711 ceph 16u IPv4 8169353
0t0 TCP 10.2.1.1:6802 (LISTEN)
ceph-osd 1541711 ceph 17u IPv4 8169357
0t0 TCP 10.2.1.1:6803 (LISTEN)
ceph-osd 1541711 ceph 18u IPv4 8169362
0t0 TCP 10.2.1.1:6804 (LISTEN)
ceph-osd 1541711 ceph 19u IPv4 8169368
0t0 TCP 10.2.1.1:6805 (LISTEN)
ceph-osd 1541711 ceph 20u IPv4 8169375
0t0 TCP 10.2.1.1:6806 (LISTEN)
ceph-osd 1541711 ceph 21u IPv4 8169383
0t0 TCP 10.2.1.1:6807 (LISTEN)
ceph-osd 1541711 ceph 22u IPv4 8169392
0t0 TCP 10.2.1.1:6808 (LISTEN)
ceph-osd 1541711 ceph 23u IPv4 8169402
0t0 TCP 10.2.1.1:6809 (LISTEN)
Sessions to the other nodes use the wrong IP address:
@nbs-vp-01:~# lsof -nPK i | grep ceph | grep 10.2.1.2
ceph-mds 1541648 ceph 28u IPv4 8279520
0t0 TCP 10.2.0.2:44180->10.2.1.2:6800 (ESTABLISHED)
ceph-mgr 1541654 ceph 41u IPv4 8289842
0t0 TCP 10.2.0.2:44146->10.2.1.2:6800 (ESTABLISHED)
ceph-mon 1541703 ceph 40u IPv4 8174827
0t0 TCP 10.2.0.2:40864->10.2.1.2:3300 (ESTABLISHED)
ceph-osd 1541711 ceph 65u IPv4 8171035
0t0 TCP 10.2.0.2:58716->10.2.1.2:6804 (ESTABLISHED)
ceph-osd 1541711 ceph 66u IPv4 8172960
0t0 TCP 10.2.0.2:54586->10.2.1.2:6806 (ESTABLISHED)
root@nbs-vp-01:~# lsof -nPK i | grep ceph | grep 10.2.1.3
ceph-mds 1541648 ceph 30u IPv4 8292421
0t0 TCP 10.2.0.2:45710->10.2.1.3:6802 (ESTABLISHED)
ceph-mon 1541703 ceph 46u IPv4 8173025
0t0 TCP 10.2.0.2:40164->10.2.1.3:3300 (ESTABLISHED)
ceph-osd 1541711 ceph 67u IPv4 8173043
0t0 TCP 10.2.0.2:56920->10.2.1.3:6804 (ESTABLISHED)
ceph-osd 1541711 ceph 68u IPv4 8171063
0t0 TCP 10.2.0.2:41952->10.2.1.3:6806 (ESTABLISHED)
ceph-osd 1541711 ceph 69u IPv4 8178891
0t0 TCP 10.2.0.2:57890->10.2.1.3:6808 (ESTABLISHED)
See below our cluster config:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.2.1.0/24
fsid = 0f19b6ff-0432-4c3f-b0cb-730e8302dc2c
mon_allow_pool_delete = true
mon_host = 10.2.1.1 10.2.1.2 10.2.1.3
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.2.1.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.nbs-vp-01]
host = nbs-vp-01
mds_standby_for_name = pve
[mds.nbs-vp-03]
host = nbs-vp-03
mds standby for name = pve
[osd.0]
public addr = 10.2.1.1
cluster addr = 10.2.1.1
[osd.1]
public addr = 10.2.1.2
cluster addr = 10.2.1.2
[osd.2]
public addr = 10.2.1.3
cluster addr = 10.2.1.3
[mgr.nbs-vp-01]
public addr = 10.2.1.1
[mgr.nbs-vp-02]
public addr = 10.2.1.2
[mgr.nbs-vp-03]
public addr = 10.2.1.3
[mon.nbs-vp-01]
public addr = 10.2.1.1
[mon.nbs-vp-02]
public addr = 10.2.1.2
[mon.nbs-vp-03]
public addr = 10.2.1.3
Cheers,
Liviu
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]