Re: [ceph-users] osd backfills and recovery limit issue

cgxu Thu, 10 Aug 2017 02:13:11 -0700

The explain about osd_max_backfills is below.

osd max backfills


Description:    The maximum number of backfills allowed to or from a single OSD.
Type:   64-bit Unsigned Integer
Default:        1

So, I just think the option does not limit osd numbers in backfill activity.





> 在 2017年8月10日，下午1:58，Hyun Ha <[email protected]> 写道：
> 
> Thank you for comment.
> 
> I can understand what you mean.
> When one osd goes down, the osd has many PGs through whole ceph cluster 
> nodes, so each nodes can have one backfill/recovery per osd and ceph culster 
> shows many backfills/recoverys.
> The other side, When one osd goes up, the osd needs to copy PG one by one 
> from other nodes, so ceph cluster shows 1 backfill/recovery.
> Is that right?
> 
> When host or osd goes down, it can give more performance impact than when 
> host or osd goes up.
> So, Is there any configuration to limit osd count per PG when ceph is doing 
> recovers/backfills? 
> Or Is it possible when the usage of system resource(cpu, memory, network 
> throughput, etc) is low, force more recovery/backfills like recovery 
> scheduling?
> 
> Thank you.
> 
> 2017-08-10 13:31 GMT+09:00 David Turner <[email protected] 
> <mailto:[email protected]>>:
> osd_max_backfills is a setting per osd.  With that set to 1, each osd will 
> only be involved in a single backfill/recovery at the same time.  However the 
> cluster as a whole will have as many backfills as it can while each osd is 
> only involved in 1 each.
> 
> On Wed, Aug 9, 2017 at 10:58 PM 하현 <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi ceph experts.
> 
> I confused when set limitation of osd max backfills.
> When osd down recovery&backfills occuerred, and osd up is same.
> 
> I want to set limitation for backfills to 1.
> So, I set config as below.
> 
> 
> # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show|egrep 
> "osd_max_backfills|osd_recovery_threads|osd_recovery_max_active|osd_recovery_op_priority"
>     "osd_max_backfills": "1",
>     "osd_recovery_threads": "1",
>     "osd_recovery_max_active": "1",
>     "osd_recovery_op_priority": "3",
> 
> When osd up it seemed works good but when osd down it seemed not works as I 
> thinks.
> Please see the ceph watch logs.
> 
> osd down>
> pgmap v898158: 2048 pgs: 20 remapped+peering, 106 active+undersized+degraded, 
> 1922 active+clean; 641 B/s rd, 253 kB/s wr, 36 op/s; 45807/1807242 objects 
> degraded (2.535%)    
> pgmap v898159: 2048 pgs: 5 active+undersized+degraded+remapped+backfilling, 9 
> activating+undersized+degraded+remapped, 24 
> active+undersized+degraded+remapped+wait_backfill, 20 remapped+peering, 68 
> active+undersized+degraded, 1922 active+clean; 510 B/s rd, 498 kB/s wr, 42 
> op/s; 41619/1812733 objects degraded (2.296%); 21029/1812733 objects 
> misplaced (1.160%); 149 MB/s, 37 objects/s recovering    
> pgmap v898168: 2048 pgs: 16 active+undersized+degraded+remapped+backfilling, 
> 110 active+undersized+degraded+remapped+wait_backfill, 1922 active+clean; 508 
> B/s rd, 562 kB/s wr, 61 op/s; 54118/1823939 objects degraded (2.967%); 
> 86984/1823939 objects misplaced (4.769%); 4025 MB/s, 1006 objects/s 
> recovering      
> pgmap v898192: 2048 pgs: 3 peering, 1 activating, 13 
> active+undersized+degraded+remapped+backfilling, 106 
> active+undersized+degraded+remapped+wait_backfill, 1925 active+clean; 10184 
> B/s rd, 362 kB/s wr, 47 op/s; 49724/1823312 objects degraded (2.727%); 
> 79709/1823312 objects misplaced (4.372%); 1949 MB/s, 487 objects/s recovering 
>    
> pgmap v898216: 2048 pgs: 1 active+undersized+remapped, 11 
> active+undersized+degraded+remapped+backfilling, 98 
> active+undersized+degraded+remapped+wait_backfill, 1938 active+clean; 10164 
> B/s rd, 251 kB/s wr, 37 op/s; 44429/1823312 objects degraded (2.437%); 
> 74037/1823312 objects misplaced (4.061%); 2751 MB/s, 687 objects/s recovering 
>        
> pgmap v898541: 2048 pgs: 1 active+undersized+degraded+remapped+backfilling, 
> 2047 active+clean; 218 kB/s wr, 39 op/s; 261/1806097 objects degraded 
> (0.014%); 543/1806097 objects misplaced (0.030%); 677 MB/s, 9 keys/s, 176 
> objects/s recovering      
> 
> osd up>
> pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering, 2020 
> active+clean; 5594 B/s rd, 452 kB/s wr, 54 op/s
> pgmap v899277: 2048 pgs: 1 active+remapped+backfilling, 41 
> active+remapped+wait_backfill, 2 activating, 14 peering, 1990 active+clean; 
> 595 kB/s wr, 23 op/s; 36111/1823939 objects misplaced (1.980%); 380 MB/s, 95 
> objects/s recovering
> pgmap v899298: 2048 pgs: 1 peering, 1 active+remapped+backfilling, 40 
> active+remapped+wait_backfill, 2006 active+clean; 723 kB/s wr, 13 op/s; 
> 34903/1823294 objects misplaced (1.914%); 1113 MB/s, 278 objects/s recovering
> pgmap v899342: 2048 pgs: 1 active+remapped+backfilling, 39 
> active+remapped+wait_backfill, 2008 active+clean; 5615 B/s rd, 291 kB/s wr, 
> 41 op/s; 33150/1822666 objects misplaced (1.819%)
> pgmap v899274: 2048 pgs: 2 activating, 14 peering, 12 remapped+peering, 2020 
> active+clean;5594 B/s rd, 452 kB/s wr, 54 op/s
> pgmap v899796: 2048 pgs: 1 activating, 1 active+remapped+backfilling, 10 
> active+remapped+wait_backfill, 2036 active+clean; 235 kB/s wr, 22 op/s; 
> 6423/1809085 objects misplaced (0.355%)
> 
> in osd down> logs,  we can see 16 backfills, and in osd up> logs, we can see 
> only one backfills. Is that correct? If not, what config should I set ?
> Thank you in advance.
> _______________________________________________
> ceph-users mailing list
> [email protected] <mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd backfills and recovery limit issue

Reply via email to