On 05/29/2015 04:47 PM, Samuel Just wrote:
Many people have reported that they need to lower the osd recovery config
options to minimize the impact of recovery on client io. We are talking about
changing the defaults as follows:
osd_max_backfills to 1 (from 10)
osd_recovery_max_active to 3 (from 15)
osd_recovery_op_priority to 1 (from 10)
osd_recovery_max_single_start to 1 (from 5)
We'd like a bit of feedback first though. Is anyone happy with the current
configs? Is anyone using something between these values and the current
defaults? What kind of workload? I'd guess that lowering osd_max_backfills to
1 is probably a good idea, but I wonder whether lowering
osd_recovery_max_active and osd_recovery_max_single_start will cause small
objects to recover unacceptably slowly.
Thoughts?
We ran recovery tests last year around when firefly was released. The
basic gist of it was that as you increase client IO, the ratio of
backfill to client IO changes for a given combination of priority
settings. IE you can tune around 10 15 10 5, or 1 3 1 1, but in each
case the ratio of client to recovery IO appears to scale with the amount
of client IO, even past the super saturation point. I believe users
will have a hard time finding optimal settings as clusters at the
saturation point will behave differently than those in heavy
super-saturation.
http://nhm.ceph.com/Ceph_3XRep_Backfill_Recovery_Results.pdf
http://nhm.ceph.com/Ceph_62EC_Backfill_Recovery_Results.pdf
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com