I'm pretty sure I put up one of those scripts in the past.  Basically what
we did was we set our scrub cycle to something like 40 days, we then sort
all PGs by the last time they were deep scrubbed.  We grab the oldest 1/30
of those PGs and tell them to deep-scrub manually, the next day we do it
again.  After a month or so, your PGs should be fairly evenly spaced out
over 30 days.  With those numbers you could disable the cron to run the
deep-scrubs for maintenance up to 10 days every 40 days and still scrub all
of your PGs during that time.

On Mon, Mar 5, 2018 at 2:00 PM Gregory Farnum <gfar...@redhat.com> wrote:

> On Mon, Mar 5, 2018 at 9:56 AM Jonathan D. Proulx <j...@csail.mit.edu>
> wrote:
>> Hi All,
>> I've recently noticed my deep scrubs are EXTREAMLY poorly
>> distributed.  They are stating with in the 18->06 local time start
>> stop time but are not distrubuted over enough days or well distributed
>> over the range of days they have.
>> root@ceph-mon0:~# for date in `ceph pg dump | awk '/active/{print
>> $20}'`; do date +%D -d $date; done | sort | uniq -c
>> dumped all
>>       1 03/01/18
>>       6 03/03/18
>>    8358 03/04/18
>>    1875 03/05/18
>> So very nearly all 10240 pgs scrubbed lastnight/this morning.  I've
>> been kicking this around for a while since I noticed poor distribution
>> over a 7 day range when I was really pretty sure I'd changed that from
>> the 7d default to 28d.
>> Tried kicking it out to 42 days about a week ago with:
>> ceph tell osd.* injectargs '--osd_deep_scrub_interval 3628800'
>> There were many error suggesting it could nto reread the change and I'd
>> need to restart the OSDs but 'ceph daemon osd.0 config show |grep
>> osd_deep_scrub_interval' showed the right value so I let it roll for a
>> week but the scrubs did not spread out.
>> So Friday I set that value in ceph.conf and did rolling restarts of
>> all OSDs.  Then doubled checked running value on all daemons.
>> Checking Sunday the nightly deeps scrubs (based on LAST_DEEP_SCRUB
>> voodoo above) show near enough 1/42nd of PGs had been scrubbed
>> Saturday night that I thought this was working.
>> This morning I checked again and got the results above.
>> I would expect after changing to a 42d scrub cycle I'd see approx 1/42
>> of the PGs deep scrub each night untill there was a roughly even
>> distribution over the past 42 days.
>> So which thing is broken my config or my expectations?
> Sadly, changing the interval settings does not directly change the
> scheduling of deep scrubs. Instead, it merely influences whether a PG will
> get queued for scrub when it is examined as a candidate, based on how
> out-of-date its scrub is. (That is, nothing holistically goes "I need to
> scrub 1/n of these PGs every night"; there's a simple task that says "is
> this PG's last scrub more than n days old?")
> Users have shared various scripts on the list for setting up a more even
> scrub distribution by fiddling with the settings and poking at specific PGs
> to try and smear them out over the whole time period; I'd check archives or
> google for those. :)
> -Greg
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users mailing list

Reply via email to