I do run with osd_max_backfills and osd_recovery_max_active turned up quite a bit from the defaults, I'm trying for as much recovery throughput as possible. I would hazard a guess that the impact seen from the sleep settings is proportionally much smaller if your other recovery-related parameters are more default - but it starts to dominate if you remove other bottlenecks on recovery I/O.
Rich On 14/09/17 15:02, Mark Nelson wrote: > I'm really glad to hear that it wasn't bluestore! :) > > It raises another concern though. We didn't expect to see that much of a > slowdown with the current throttle settings. An order of magnitude slowdown > in recovery performance isn't ideal at all. > > I wonder if we could improve things dramatically if we kept track of client > IO activity on the OSD and remove the throttle if there's been no client > activity for X seconds. Theoretically more advanced heuristics might cover > this, but in the interim it seems to me like this would solve the very > specific problem you are seeing while still throttling recovery when IO is > happening. > > Mark > > On 09/14/2017 06:19 AM, Richard Hesketh wrote: >> Yeah, that hit the nail on the head. Significantly reducing/eliminating the >> recovery sleep times increases the recovery speed back up (and beyond!) the >> levels I was expecting to see - recovery is almost an order of magnitude >> faster now. Thanks for educating me about those changes! >> >> Rich >> >> On 14/09/17 11:16, Richard Hesketh wrote: >>> Hi Mark, >>> >>> No, I wasn't familiar with that work. I am in fact comparing speed of >>> recovery to maintenance work I did while the cluster was in Jewel; I >>> haven't manually done anything to sleep settings, only adjusted max >>> backfills OSD settings. New options that introduce arbitrary slowdown to >>> recovery operations to preserve client performance would explain what I'm >>> seeing! I'll have a tinker with adjusting those values (in my particular >>> case client load on the cluster is very low and I don't have to honour any >>> guarantees about client performance - getting back into HEALTH_OK asap is >>> preferable). >>> >>> Rich >>> >>> On 13/09/17 21:14, Mark Nelson wrote: >>>> Hi Richard, >>>> >>>> Regarding recovery speed, have you looked through any of Neha's results on >>>> recovery sleep testing earlier this summer? >>>> >>>> https://www.spinics.net/lists/ceph-devel/msg37665.html >>>> >>>> She tested bluestore and filestore under a couple of different scenarios. >>>> The gist of it is that time to recover changes pretty dramatically >>>> depending on the sleep setting. >>>> >>>> I don't recall if you said earlier, but are you comparing filestore and >>>> bluestore recovery performance on the same version of ceph with the same >>>> sleep settings? >>>> >>>> Mark >>>> >>>> On 09/12/2017 05:24 AM, Richard Hesketh wrote: >>>>> Thanks for the links. That does seem to largely confirm that what I >>>>> haven't horribly misunderstood anything and I've not been doing anything >>>>> obviously wrong while converting my disks; there's no point specifying >>>>> separate WAL/DB partitions if they're going to go on the same device, >>>>> throw as much space as you have available at the DB partitions and >>>>> they'll use all the space they can, and significantly reduced I/O on the >>>>> DB/WAL device compared to Filestore is expected since bluestore's nixed >>>>> the write amplification as much as possible. >>>>> >>>>> I'm still seeing much reduced recovery speed on my newly Bluestored >>>>> cluster, but I guess that's a tuning issue rather than evidence of >>>>> catastrophe. >>>>> >>>>> Rich >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> [email protected] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Richard Hesketh Systems Engineer, Research Platforms BBC Research & Development
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
