Re: [ceph-users] osd backfills and recovery limit issue

David Turner Thu, 10 Aug 2017 07:32:02 -0700

1 backfill peer osd has never severely impacted performance afaik.  That is
a very small amount of io.  I run with 2-5 in each of my clusters.  When an
osd comes up, the map changes enough that more PGs will move than just what
are backfilling into the new osd.


To modify how many backfills are happening at once, you can inject the new
setting.  `ceph tell osd.* injectargs '--osd_max_backfills=2` would set all
osds to 2.  Note that when increasing this seeing new backfills will happen
immediately, but when decreasing it current backfills will need to finish
before you will see a decrease in running backfills.  I recommend watching
iostat on your osds to see how much io they use during normal peak times,
as well as with varying values for osd_max_backfills during recovery.
That's how you learn what you can set this setting to in your environment
without hurting your client io.

On Thu, Aug 10, 2017, 5:12 AM cgxu <[email protected]> wrote:

> The explain about osd_max_backfills is below.
>
> osd max backfills
> Description: The maximum number of backfills allowed to or from a single
> OSD.
> Type: 64-bit Unsigned Integer
> Default: 1
>
> So, I just think the option does not limit osd numbers in backfill
> activity.
>
>
>
>
>
> 在 2017年8月10日，下午1:58，Hyun Ha <[email protected]> 写道：
>
> Thank you for comment.
>
> I can understand what you mean.
> When one osd goes down, the osd has many PGs through whole ceph cluster
> nodes, so each nodes can have one backfill/recovery per osd and ceph
> culster shows many backfills/recoverys.
> The other side, When one osd goes up, the osd needs to copy PG one by one
> from other nodes, so ceph cluster shows 1 backfill/recovery.
> Is that right?
>
> When host or osd goes down, it can give more performance impact than when
> host or osd goes up.
> So, Is there any configuration to limit osd count per PG when ceph is
> doing recovers/backfills?
> Or Is it possible when the usage of system resource(cpu, memory, network
> throughput, etc) is low, force more recovery/backfills like recovery
> scheduling?
>
> Thank you.
>
> 2017-08-10 13:31 GMT+09:00 David Turner <[email protected]>:
>
>> osd_max_backfills is a setting per osd.  With that set to 1, each osd
>> will only be involved in a single backfill/recovery at the same time.
>> However the cluster as a whole will have as many backfills as it can while
>> each osd is only involved in 1 each.
>>
>> On Wed, Aug 9, 2017 at 10:58 PM 하현 <[email protected]> wrote:
>>
>>> Hi ceph experts.
>>>
>>> I confused when set limitation of osd max backfills.
>>> When osd down recovery&backfills occuerred, and osd up is same.
>>>
>>> I want to set limitation for backfills to 1.
>>> So, I set config as below.
>>>
>>>
>>> # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show|egrep
>>> "osd_max_backfills|osd_recovery_threads|osd_recovery_max_active|osd_recovery_op_priority"
>>>     "osd_max_backfills": "1",
>>>     "osd_recovery_threads": "1",
>>>     "osd_recovery_max_active": "1",
>>>     "osd_recovery_op_priority": "3",
>>>
>>> When osd up it seemed works good but when osd down it seemed not works
>>> as I thinks.
>>> Please see the ceph watch logs.
>>>
>>> osd down>
>>> pgmap v898158: 2048 pgs: 20 remapped+peering, 106
>>> active+undersized+degraded, 1922 active+clean; 641 B/s rd, 253 kB/s wr, 36
>>> op/s; 45807/1807242 objects degraded (2.535%)
>>> pgmap v898159: 2048 pgs: *5
>>> active+undersized+degraded+remapped+backfilling*, 9
>>> activating+undersized+degraded+remapped, 24
>>> active+undersized+degraded+remapped+wait_backfill, 20 remapped+peering, 68
>>> active+undersized+degraded, 1922 active+clean; 510 B/s rd, 498 kB/s wr, 42
>>> op/s; 41619/1812733 objects degraded (2.296%); 21029/1812733 objects
>>> misplaced (1.160%); 149 MB/s, 37 objects/s recovering
>>> pgmap v898168: 2048 pgs: *16
>>> active+undersized+degraded+remapped+backfilling*, 110
>>> active+undersized+degraded+remapped+wait_backfill, 1922 active+clean; 508
>>> B/s rd, 562 kB/s wr, 61 op/s; 54118/1823939 objects degraded (2.967%);
>>> 86984/1823939 objects misplaced (4.769%); 4025 MB/s, 1006 objects/s
>>> recovering
>>> pgmap v898192: 2048 pgs: 3 peering, 1 activating, 13
>>> active+undersized+degraded+remapped+backfilling, 106
>>> active+undersized+degraded+remapped+wait_backfill, 1925 active+clean; 10184
>>> B/s rd, 362 kB/s wr, 47 op/s; 49724/1823312 objects degraded (2.727%);
>>> 79709/1823312 objects misplaced (4.372%); 1949 MB/s, 487 objects/s
>>> recovering
>>> pgmap v898216: 2048 pgs: 1 active+undersized+remapped, 11
>>> active+undersized+degraded+remapped+backfilling, 98
>>> active+undersized+degraded+remapped+wait_backfill, 1938 active+clean; 10164
>>> B/s rd, 251 kB/s wr, 37 op/s; 44429/1823312 objects degraded (2.437%);
>>> 74037/1823312 objects misplaced (4.061%); 2751 MB/s, 687 objects/s
>>> recovering
>>> pgmap v898541: 2048 pgs: 1
>>> active+undersized+degraded+remapped+backfilling, 2047 active+clean; 218
>>> kB/s wr, 39 op/s; 261/1806097 objects degraded (0.014%); 543/1806097
>>> objects misplaced (0.030%); 677 MB/s, 9 keys/s, 176 objects/s recovering
>>>
>>> osd up>
>>> pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering,
>>> 2020 active+clean; 5594 B/s rd, 452 kB/s wr, 54 op/s
>>> pgmap v899277: 2048 pgs: *1 active+remapped+backfilling*, 41
>>> active+remapped+wait_backfill, 2 activating, 14 peering, 1990 active+clean;
>>> 595 kB/s wr, 23 op/s; 36111/1823939 objects misplaced (1.980%); 380 MB/s,
>>> 95 objects/s recovering
>>> pgmap v899298: 2048 pgs: 1 peering, *1 active+remapped+backfilling*, 40
>>> active+remapped+wait_backfill, 2006 active+clean; 723 kB/s wr, 13 op/s;
>>> 34903/1823294 objects misplaced (1.914%); 1113 MB/s, 278 objects/s
>>> recovering
>>> pgmap v899342: 2048 pgs: 1 active+remapped+backfilling, 39
>>> active+remapped+wait_backfill, 2008 active+clean; 5615 B/s rd, 291 kB/s wr,
>>> 41 op/s; 33150/1822666 objects misplaced (1.819%)
>>> pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering,
>>> 2020 active+clean;5594 B/s rd, 452 kB/s wr, 54 op/s
>>> pgmap v899796: 2048 pgs: 1 activating, 1 active+remapped+backfilling, 10
>>> active+remapped+wait_backfill, 2036 active+clean; 235 kB/s wr, 22 op/s;
>>> 6423/1809085 objects misplaced (0.355%)
>>>
>>> in osd down> logs,  we can see 16 backfills, and in osd up> logs, we can
>>> see only one backfills. Is that correct? If not, what config should I set ?
>>> Thank you in advance.
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd backfills and recovery limit issue

Reply via email to