[ceph-users] Deep scrub distribution

Adrian Saul Wed, 05 Jul 2017 18:53:22 -0700

During a recent snafu with a production cluster I disabled scrubbing and deep 
scrubbing in order to reduce load on the cluster while things backfilled and 
settled down.  The PTSD caused by the incident meant I was not keen to 
re-enable it until I was confident we had fixed the root cause of the issues 
(driver issues with a new NIC type introduced with new hardware that did not 
show up until production load hit them).   My cluster is using Jewel 10.2.1, 
and is a mix of SSD and SATA over 20 hosts, 352 OSDs in total.


Fast forward a few weeks and I was ready to re-enable it.  On some reading I 
was concerned the cluster might kick off excessive scrubbing once I unset the 
flags, so I tried increasing the deep scrub interval from 7 days to 60 days - 
with most of the last deep scrubs being from over a month before I was hoping 
it would distribute them over the next 30 days.  Having unset the flag and 
carefully watched the cluster it seems to have just run a steady catch up 
without significant impact.  What I am noticing though is that the scrubbing is 
seeming to just run through the full set of PGs, so it did some 2280 PGs last 
night over 6 hours, and so far today in 12 hours another 4000 odd.  With 13408 
PGs, I am guessing that all this will stop some time early tomorrow.

ceph-glb-fec-01[/var/log]$ sudo ceph pg dump|awk '{print $20}'|grep 
2017|sort|uniq -c
dumped all in format plain
      5 2017-05-23
     18 2017-05-24
     33 2017-05-25
     52 2017-05-26
     89 2017-05-27
    114 2017-05-28
    144 2017-05-29
    172 2017-05-30
    256 2017-05-31
    191 2017-06-01
    230 2017-06-02
    369 2017-06-03
    606 2017-06-04
    680 2017-06-05
    919 2017-06-06
   1261 2017-06-07
   1876 2017-06-08
     15 2017-06-09
   2280 2017-07-05
   4098 2017-07-06

My concern is am I now set to have all 13408 PGs do a deep scrub in 60 days in 
a serial fashion again over 3 days.  I would much rather they distribute over 
that period.

Will the OSDs do this distribution themselves now they have caught up, or do I 
need to say create a script that will trigger batches of PGs to deep scrub over 
time to push out the distribution again?





Adrian Saul | Infrastructure Projects Team Lead
IT
T 02 9009 9041 | M +61 402 075 760
30 Ross St, Glebe NSW 2037
[email protected]<mailto:[email protected]> | 
www.tpg.com.au<http://www.tpg.com.au/>

TPG Telecom (ASX: TPM)


[Description: http://res.tpgi.com.au/img/signature/tpgtelecomlogo.jpg]


This email and any attachments are confidential and may be subject to 
copyright, legal or some other professional privilege. They are intended solely 
for the attention and use of the named addressee(s). They may only be copied, 
distributed or disclosed with the consent of the copyright owner. If you have 
received this email by mistake or by breach of the confidentiality clause, 
please notify the sender immediately by return email and delete or destroy all 
copies of the email. Any confidentiality, privilege or copyright is not waived 
or lost because this email has been sent to you by mistake.



Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Deep scrub distribution

Reply via email to