Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?) [and recovery sleep]

Richard Hesketh Thu, 14 Sep 2017 07:18:53 -0700

I do run with osd_max_backfills and osd_recovery_max_active turned up quite a 
bit from the defaults, I'm trying for as much recovery throughput as possible. 
I would hazard a guess that the impact seen from the sleep settings is 
proportionally much smaller if your other recovery-related parameters are more 
default - but it starts to dominate if you remove other bottlenecks on recovery 
I/O.


Rich

On 14/09/17 15:02, Mark Nelson wrote:
> I'm really glad to hear that it wasn't bluestore! :)
> 
> It raises another concern though. We didn't expect to see that much of a 
> slowdown with the current throttle settings.  An order of magnitude slowdown 
> in recovery performance isn't ideal at all.
> 
> I wonder if we could improve things dramatically if we kept track of client 
> IO activity on the OSD and remove the throttle if there's been no client 
> activity for X seconds.  Theoretically more advanced heuristics might cover 
> this, but in the interim it seems to me like this would solve the very 
> specific problem you are seeing while still throttling recovery when IO is 
> happening.
> 
> Mark
> 
> On 09/14/2017 06:19 AM, Richard Hesketh wrote:
>> Yeah, that hit the nail on the head. Significantly reducing/eliminating the 
>> recovery sleep times increases the recovery speed back up (and beyond!) the 
>> levels I was expecting to see - recovery is almost an order of magnitude 
>> faster now. Thanks for educating me about those changes!
>>
>> Rich
>>
>> On 14/09/17 11:16, Richard Hesketh wrote:
>>> Hi Mark,
>>>
>>> No, I wasn't familiar with that work. I am in fact comparing speed of 
>>> recovery to maintenance work I did while the cluster was in Jewel; I 
>>> haven't manually done anything to sleep settings, only adjusted max 
>>> backfills OSD settings. New options that introduce arbitrary slowdown to 
>>> recovery operations to preserve client performance would explain what I'm 
>>> seeing! I'll have a tinker with adjusting those values (in my particular 
>>> case client load on the cluster is very low and I don't have to honour any 
>>> guarantees about client performance - getting back into HEALTH_OK asap is 
>>> preferable).
>>>
>>> Rich
>>>
>>> On 13/09/17 21:14, Mark Nelson wrote:
>>>> Hi Richard,
>>>>
>>>> Regarding recovery speed, have you looked through any of Neha's results on 
>>>> recovery sleep testing earlier this summer?
>>>>
>>>> https://www.spinics.net/lists/ceph-devel/msg37665.html
>>>>
>>>> She tested bluestore and filestore under a couple of different scenarios.  
>>>> The gist of it is that time to recover changes pretty dramatically 
>>>> depending on the sleep setting.
>>>>
>>>> I don't recall if you said earlier, but are you comparing filestore and 
>>>> bluestore recovery performance on the same version of ceph with the same 
>>>> sleep settings?
>>>>
>>>> Mark
>>>>
>>>> On 09/12/2017 05:24 AM, Richard Hesketh wrote:
>>>>> Thanks for the links. That does seem to largely confirm that what I 
>>>>> haven't horribly misunderstood anything and I've not been doing anything 
>>>>> obviously wrong while converting my disks; there's no point specifying 
>>>>> separate WAL/DB partitions if they're going to go on the same device, 
>>>>> throw as much space as you have available at the DB partitions and 
>>>>> they'll use all the space they can, and significantly reduced I/O on the 
>>>>> DB/WAL device compared to Filestore is expected since bluestore's nixed 
>>>>> the write amplification as much as possible.
>>>>>
>>>>> I'm still seeing much reduced recovery speed on my newly Bluestored 
>>>>> cluster, but I guess that's a tuning issue rather than evidence of 
>>>>> catastrophe.
>>>>>
>>>>> Rich
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Richard Hesketh
Systems Engineer, Research Platforms
BBC Research & Development

signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore "separate" WAL and DB (and WAL/DB size?) [and recovery sleep]

Reply via email to