Hello,
On Mon, 23 Jun 2014 21:50:50 -0700 David Zafman wrote: > > By default osd_scrub_max_interval and osd_deep_scrub_interval are 1 week > 604800 seconds (60*60*24*7) and osd_scrub_min_interval is 1 day 86400 > seconds (60*60*24). As long as osd_scrub_max_interval <= > osd_deep_scrub_interval then the load won’t impact when deep scrub > occurs. I suggest that osd_scrub_min_interval <= > osd_scrub_max_interval <= osd_deep_scrub_interval. > > I’d like to know how you have those 3 values set, so I can confirm that > this explains the issue. > They are and were unsurprisingly set to the default values. Now to provide some more information, shortly after the inception of this cluster I did initiate a deep scrub on all OSDs on 00:30 on a Sunday morning (the things we do for Ceph, a scheduler with a variety of rules would be nice, but I digress). This took until 05:30 despite the cluster being idle and with close to no data in it. In retrospect it seems clear to me that this already was influenced by the load threshold (a scrub I initiated with the new threshold value of 1.5 finished in just 30 minutes last night). Consequently all the normal scrubs happened in the same time frame until this weekend on the 21st (normal scrub). The deep scrub on the 22nd clearly ran into the load threshold. So if I understand you correctly setting osd_scrub_max_interval to 6 days should have deep scrubs ignore the load threshold as per the documentation? Regards, Christian > > David Zafman > Senior Developer > http://www.inktank.com > http://www.redhat.com > > On Jun 23, 2014, at 7:01 PM, Christian Balzer <[email protected]> wrote: > > > > > Hello, > > > > On Mon, 23 Jun 2014 14:20:37 -0400 Gregory Farnum wrote: > > > >> Looks like it's a doc error (at least on master), but it might have > >> changed over time. If you're running Dumpling we should change the > >> docs. > > > > Nope, I'm running 0.80.1 currently. > > > > Christian > > > >> -Greg > >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >> > >> > >> On Sun, Jun 22, 2014 at 10:18 PM, Christian Balzer <[email protected]> > >> wrote: > >>> > >>> Hello, > >>> > >>> This weekend I noticed that the deep scrubbing took a lot longer than > >>> usual (long periods without a scrub running/finishing), even though > >>> the cluster wasn't all that busy. > >>> It was however busier than in the past and the load average was above > >>> 0.5 frequently. > >>> > >>> Now according to the documentation "osd scrub load threshold" is > >>> ignored when it comes to deep scrubs. > >>> > >>> However after setting it to 1.5 and restarting the OSDs the > >>> floodgates opened and all those deep scrubs are now running at full > >>> speed. > >>> > >>> Documentation error or did I "unstuck" something by the OSD restart? > >>> > >>> Regards, > >>> > >>> Christian > >>> -- > >>> Christian Balzer Network/Systems Engineer > >>> [email protected] Global OnLine Japan/Fusion Communications > >>> http://www.gol.com/ > >>> _______________________________________________ > >>> ceph-users mailing list > >>> [email protected] > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > > -- > > Christian Balzer Network/Systems Engineer > > [email protected] Global OnLine Japan/Fusion Communications > > http://www.gol.com/ > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer [email protected] Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
