Hi,

We're using a 24 server / 48 OSD (3 replicas) Ceph cluster (version 0.67.3) for 
RBD storage only and it is working great, but if a failed disk is replaced by a 
brand new one and the system starts to backfill it gives a lot of slow requests 
messages for 5 to 10 minutes. Then it does become stable again while the 
backfilling is still going on. I did already try to slow down the backfilling 
by giving:

ceph tell osd.* injectargs '--osd_max_backfills 1'
ceph tell mon.* injectargs '--osd_max_backfills 1'
ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell mon.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_max_single_start 1'
ceph tell mon.* injectargs '--osd_recovery_max_single_start 1'

but this doesn't seem to help. What can we do to make a replacement less 
impacting the cluster?

Regards,
Erwin
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to