When I lose a disk OR replace a OSD in my POC ceph cluster, it takes a very
long time to rebalance. I should note that my cluster is slightly unique
in that I am using cephfs(shouldn't matter?) and it currently contains
about 310 million objects.
The last time I replaced a disk/OSD was 2.5 days ago and it is still
rebalancing. This is on a cluster with no client load.
The configurations is 5 hosts with 6 x 1TB 7200rpm SATA OSD's & 1 850 Pro
SSD which contains the journals for said OSD's. Thats means 30 OSD's in
total. System disk is on its own disk. I'm also using a backend network
with single Gb NIC. THe rebalancing rate(objects/s) seems to be very slow
when it is close to finishing....say <1% objects misplaced.
It doesn't seem right that it would take 2+ days to rebalance a 1TB disk
with no load on the cluster. Are my expectations off?
I'm not sure if my pg_num/pgp_num needs to be changed OR the rebalance time
is dependent on the number of objects in the pool. These are thoughts i've
had but am not certain are relevant here.
$ sudo ceph -v
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
$ sudo ceph -s
[sudo] password for bababurko:
cluster f25cb23f-2293-4682-bad2-4b0d8ad10e79
health HEALTH_WARN
5 pgs backfilling
5 pgs stuck unclean
recovery 3046506/676638611 objects misplaced (0.450%)
monmap e1: 3 mons at {cephmon01=
10.15.24.71:6789/0,cephmon02=10.15.24.80:6789/0,cephmon03=10.15.24.135:6789/0
}
election epoch 20, quorum 0,1,2 cephmon01,cephmon02,cephmon03
mdsmap e6070: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
osdmap e4395: 30 osds: 30 up, 30 in; 5 remapped pgs
pgmap v3100039: 2112 pgs, 3 pools, 6454 GB data, 321 Mobjects
18319 GB used, 9612 GB / 27931 GB avail
3046506/676638611 objects misplaced (0.450%)
2095 active+clean
12 active+clean+scrubbing+deep
5 active+remapped+backfilling
recovery io 2294 kB/s, 147 objects/s
$ sudo rados df
pool name KB objects clones degraded
unfound rd rd KB wr wr KB
cephfs_data 6767569962 335746702 0 0
0 2136834 1 676984208 7052266742
cephfs_metadata 42738 1058437 0 0
0 16130199 30718800215 295996938 3811963908
rbd 0 0 0 0
0 0 0 0 0
total used 19209068780 336805139
total avail 10079469460
total space 29288538240
$ sudo ceph osd pool get cephfs_data pgp_num
pg_num: 1024
$ sudo ceph osd pool get cephfs_metadata pgp_num
pg_num: 1024
thanks,
Bob
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com