Hi, We're using a 24 server / 48 OSD (3 replicas) Ceph cluster (version 0.67.3) for RBD storage only and it is working great, but if a failed disk is replaced by a brand new one and the system starts to backfill it gives a lot of slow requests messages for 5 to 10 minutes. Then it does become stable again while the backfilling is still going on. I did already try to slow down the backfilling by giving:
ceph tell osd.* injectargs '--osd_max_backfills 1' ceph tell mon.* injectargs '--osd_max_backfills 1' ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell mon.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_max_single_start 1' ceph tell mon.* injectargs '--osd_recovery_max_single_start 1' but this doesn't seem to help. What can we do to make a replacement less impacting the cluster? Regards, Erwin _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
