Hello,
I have a Ceph Cluster with specifications below:
3 x Monitor node
6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD
journals)
Distributed public and private networks. All NICs are 10Gbit/s
osd pool default size = 3
osd pool default min size = 2
Ceph version is Jewel 10.2.6.
Current health status:
cluster ****************
health HEALTH_OK
monmap e9: 3 mons at
{ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0}
election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
osdmap e1512: 36 osds: 36 up, 36 in
flags sortbitwise,require_jewel_osds
pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects
83871 GB used, 114 TB / 196 TB avail
1408 active+clean
My cluster is active and a lot of virtual machines running on it (Linux and
Windows VM's, database clusters, web servers etc).
When I want to add a new storage node with 1 disk, I'm getting huge problems.
With new osd, crushmap updated and Ceph Cluster turns into recovery mode.
Everything is OK. But after a while, some runnings VM's became unmanageable.
Servers become unresponsive one by one. Recovery process would take an average
of 20 hours. For this reason, I removed the new osd. Recovery process completed
and everythink become normal.
When new osd added, health status:
cluster ****************
health HEALTH_WARN
91 pgs backfill_wait
1 pgs bacfilling
28 pgs degraded
28 pgs recovery_wait
28 phs stuck degraded
recovery 2195/18486602 objects degraded (0.012%)
recovery 1279784/18486602 objects misplaced (6.923%)
monmap e9: 3 mons at
{ceph-mon01=xxx:6789/0,ceph-mon02=xxx:6789/0,ceph-mon03=xxx:6789/0}
election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
osdmap e1512: 37 osds: 37 up, 37 in
flags sortbitwise,require_jewel_osds
pgmap v7698673: 1408 pgs, 5 pools, 37365 GB data, 9436 kobjects
83871 GB used, 114 TB / 201 TB avail
2195/18486602 objects degraded (0.012%)
1279784/18486602 objects misplaced (6.923%)
1286 active+clean
91 active+remapped+wait_backfill
28 active+recovery_wait+degraded
2 active+clean+scrubbing+deep
1 active+remapped+backfilling
recovery io 430 MB/s, 119 objects/s
client io 36174 B/s rrd, 5567 kB/s wr, 5 op/s rd, 700 op/s wr
Some Ceph config parameters:
osd_max_backfills = 1
osd_backfill_full_ratio = 0.85
osd_recovery_max_active = 3
osd_recovery_threads = 1
How I can add new OSD's safely?
Best regards,
Ramazan
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com