Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

Irek Fasikhov Tue, 03 Mar 2015 03:44:57 -0800

osd_recovery_delay_start - is the delay in seconds between iterations
recovery (osd_recovery_max_active)


It is described here:
https://github.com/ceph/ceph/search?utf8=%E2%9C%93&q=osd_recovery_delay_start


2015-03-03 14:27 GMT+03:00 Andrija Panic <andrija.pa...@gmail.com>:

> Another question - I mentioned here 37% of objects being moved arround -
> this is MISPLACED object (degraded objects were 0.001%, after I removed 1
> OSD from cursh map (out of 44 OSD or so).
>
> Can anybody confirm this is normal behaviour - and are there any
> workarrounds ?
>
> I understand this is because of the object placement algorithm of CEPH,
> but still 37% of object missplaces just by removing 1 OSD from crush maps
> out of 44 make me wonder why this large percentage ?
>
> Seems not good to me, and I have to remove another 7 OSDs (we are demoting
> some old hardware nodes). This means I can potentialy go with 7 x the same
> number of missplaced objects...?
>
> Any thoughts ?
>
> Thanks
>
> On 3 March 2015 at 12:14, Andrija Panic <andrija.pa...@gmail.com> wrote:
>
>> Thanks Irek.
>>
>> Does this mean, that after peering for each PG, there will be delay of
>> 10sec, meaning that every once in a while, I will have 10sec od the cluster
>> NOT being stressed/overloaded, and then the recovery takes place for that
>> PG, and then another 10sec cluster is fine, and then stressed again ?
>>
>> I'm trying to understand process before actually doing stuff (config
>> reference is there on ceph.com but I don't fully understand the process)
>>
>> Thanks,
>> Andrija
>>
>> On 3 March 2015 at 11:32, Irek Fasikhov <malm...@gmail.com> wrote:
>>
>>> Hi.
>>>
>>> Use value "osd_recovery_delay_start"
>>> example:
>>> [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
>>> config show  | grep osd_recovery_delay_start
>>>   "osd_recovery_delay_start": "10"
>>>
>>> 2015-03-03 13:13 GMT+03:00 Andrija Panic <andrija.pa...@gmail.com>:
>>>
>>>> HI Guys,
>>>>
>>>> I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
>>>> over 37% od the data to rebalance - let's say this is fine (this is when I
>>>> removed it frm Crush Map).
>>>>
>>>> I'm wondering - I have previously set some throtling mechanism, but
>>>> during first 1h of rebalancing, my rate of recovery was going up to 1500
>>>> MB/s - and VMs were unusable completely, and then last 4h of the duration
>>>> of recover this recovery rate went down to, say, 100-200 MB.s and during
>>>> this VM performance was still pretty impacted, but at least I could work
>>>> more or a less
>>>>
>>>> So my question, is this behaviour expected, is throtling here working
>>>> as expected, since first 1h was almoust no throtling applied if I check the
>>>> recovery rate 1500MB/s and the impact on Vms.
>>>> And last 4h seemed pretty fine (although still lot of impact in general)
>>>>
>>>> I changed these throtling on the fly with:
>>>>
>>>> ceph tell osd.* injectargs '--osd_recovery_max_active 1'
>>>> ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
>>>> ceph tell osd.* injectargs '--osd_max_backfills 1'
>>>>
>>>> My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
>>>> SSD, 6 journals on another SSD)  - I have 3 of these hosts.
>>>>
>>>> Any thought are welcome.
>>>> --
>>>>
>>>> Andrija Panić
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>
>>>
>>> --
>>> С уважением, Фасихов Ирек Нургаязович
>>> Моб.: +79229045757
>>>
>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>
>
>
> --
>
> Andrija Panić
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

Reply via email to