Re: [ceph-users] osd backfills and recovery limit issue

Hyun Ha Wed, 09 Aug 2017 22:59:11 -0700

Thank you for comment.

I can understand what you mean.
When one osd goes down, the osd has many PGs through whole ceph cluster
nodes, so each nodes can have one backfill/recovery per osd and ceph
culster shows many backfills/recoverys.
The other side, When one osd goes up, the osd needs to copy PG one by one
from other nodes, so ceph cluster shows 1 backfill/recovery.
Is that right?


When host or osd goes down, it can give more performance impact than when
host or osd goes up.
So, Is there any configuration to limit osd count per PG when ceph is doing
recovers/backfills?
Or Is it possible when the usage of system resource(cpu, memory, network
throughput, etc) is low, force more recovery/backfills like recovery
scheduling?

Thank you.

2017-08-10 13:31 GMT+09:00 David Turner <[email protected]>:

> osd_max_backfills is a setting per osd.  With that set to 1, each osd will
> only be involved in a single backfill/recovery at the same time.  However
> the cluster as a whole will have as many backfills as it can while each osd
> is only involved in 1 each.
>
> On Wed, Aug 9, 2017 at 10:58 PM 하현 <[email protected]> wrote:
>
>> Hi ceph experts.
>>
>> I confused when set limitation of osd max backfills.
>> When osd down recovery&backfills occuerred, and osd up is same.
>>
>> I want to set limitation for backfills to 1.
>> So, I set config as below.
>>
>>
>> # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show|egrep
>> "osd_max_backfills|osd_recovery_threads|osd_recovery_
>> max_active|osd_recovery_op_priority"
>>     "osd_max_backfills": "1",
>>     "osd_recovery_threads": "1",
>>     "osd_recovery_max_active": "1",
>>     "osd_recovery_op_priority": "3",
>>
>> When osd up it seemed works good but when osd down it seemed not works as
>> I thinks.
>> Please see the ceph watch logs.
>>
>> osd down>
>> pgmap v898158: 2048 pgs: 20 remapped+peering, 106
>> active+undersized+degraded, 1922 active+clean; 641 B/s rd, 253 kB/s wr, 36
>> op/s; 45807/1807242 objects degraded (2.535%)
>> pgmap v898159: 2048 pgs: *5
>> active+undersized+degraded+remapped+backfilling*, 9
>> activating+undersized+degraded+remapped, 24 
>> active+undersized+degraded+remapped+wait_backfill,
>> 20 remapped+peering, 68 active+undersized+degraded, 1922 active+clean; 510
>> B/s rd, 498 kB/s wr, 42 op/s; 41619/1812733 objects degraded (2.296%);
>> 21029/1812733 objects misplaced (1.160%); 149 MB/s, 37 objects/s recovering
>> pgmap v898168: 2048 pgs: *16
>> active+undersized+degraded+remapped+backfilling*, 110
>> active+undersized+degraded+remapped+wait_backfill, 1922 active+clean;
>> 508 B/s rd, 562 kB/s wr, 61 op/s; 54118/1823939 objects degraded (2.967%);
>> 86984/1823939 objects misplaced (4.769%); 4025 MB/s, 1006 objects/s
>> recovering
>> pgmap v898192: 2048 pgs: 3 peering, 1 activating, 13
>> active+undersized+degraded+remapped+backfilling, 106
>> active+undersized+degraded+remapped+wait_backfill, 1925 active+clean;
>> 10184 B/s rd, 362 kB/s wr, 47 op/s; 49724/1823312 objects degraded
>> (2.727%); 79709/1823312 objects misplaced (4.372%); 1949 MB/s, 487
>> objects/s recovering
>> pgmap v898216: 2048 pgs: 1 active+undersized+remapped, 11
>> active+undersized+degraded+remapped+backfilling, 98
>> active+undersized+degraded+remapped+wait_backfill, 1938 active+clean;
>> 10164 B/s rd, 251 kB/s wr, 37 op/s; 44429/1823312 objects degraded
>> (2.437%); 74037/1823312 objects misplaced (4.061%); 2751 MB/s, 687
>> objects/s recovering
>> pgmap v898541: 2048 pgs: 1 active+undersized+degraded+remapped+backfilling,
>> 2047 active+clean; 218 kB/s wr, 39 op/s; 261/1806097 objects degraded
>> (0.014%); 543/1806097 objects misplaced (0.030%); 677 MB/s, 9 keys/s, 176
>> objects/s recovering
>>
>> osd up>
>> pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering,
>> 2020 active+clean; 5594 B/s rd, 452 kB/s wr, 54 op/s
>> pgmap v899277: 2048 pgs: *1 active+remapped+backfilling*, 41
>> active+remapped+wait_backfill, 2 activating, 14 peering, 1990 active+clean;
>> 595 kB/s wr, 23 op/s; 36111/1823939 objects misplaced (1.980%); 380 MB/s,
>> 95 objects/s recovering
>> pgmap v899298: 2048 pgs: 1 peering, *1 active+remapped+backfilling*, 40
>> active+remapped+wait_backfill, 2006 active+clean; 723 kB/s wr, 13 op/s;
>> 34903/1823294 objects misplaced (1.914%); 1113 MB/s, 278 objects/s
>> recovering
>> pgmap v899342: 2048 pgs: 1 active+remapped+backfilling, 39
>> active+remapped+wait_backfill, 2008 active+clean; 5615 B/s rd, 291 kB/s wr,
>> 41 op/s; 33150/1822666 objects misplaced (1.819%)
>> pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering,
>> 2020 active+clean;5594 B/s rd, 452 kB/s wr, 54 op/s
>> pgmap v899796: 2048 pgs: 1 activating, 1 active+remapped+backfilling, 10
>> active+remapped+wait_backfill, 2036 active+clean; 235 kB/s wr, 22 op/s;
>> 6423/1809085 objects misplaced (0.355%)
>>
>> in osd down> logs,  we can see 16 backfills, and in osd up> logs, we can
>> see only one backfills. Is that correct? If not, what config should I set ?
>> Thank you in advance.
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd backfills and recovery limit issue

Reply via email to