Today I noticed that deep-scrub is consistently missing some of my Placement Groups, leaving me with the following distribution of PGs and the last day they were successfully deep-scrubbed.

# ceph pg dump all | grep active | awk '{ print $20}' | sort -k1 | uniq -c
      5 2013-11-06
    221 2013-11-20
      1 2014-02-17
     25 2014-02-19
     60 2014-02-20
      4 2014-03-06
      3 2014-04-03
      6 2014-04-04
      6 2014-04-05
     13 2014-04-06
      4 2014-04-08
      3 2014-04-10
      2 2014-04-11
     50 2014-04-12
     28 2014-04-13
     14 2014-04-14
      3 2014-04-15
     78 2014-04-16
     44 2014-04-17
      8 2014-04-18
      1 2014-04-20
     16 2014-05-02
     69 2014-05-04
    140 2014-05-05
    569 2014-05-06
   9231 2014-05-07
    103 2014-05-08
    514 2014-05-09
   1593 2014-05-10
    393 2014-05-16
   2563 2014-05-17
   1283 2014-05-18
   1640 2014-05-19
   1979 2014-05-20

I have been running the default "osd deep scrub interval" of once per week, but have disabled deep-scrub on several occasions in an attempt to avoid the associated degraded cluster performance I have written about before.

To get the PGs longest in need of a deep-scrub started, I set the nodeep-scrub flag, and wrote a script to manually kick off deep-scrub according to age. It is processing as expected.

Do you consider this a feature request or a bug? Perhaps the code that schedules PGs to deep-scrub could be improved to prioritize PGs that have needed a deep-scrub the longest.

Thanks,
Mike Dawson
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to