During a recent snafu with a production cluster I disabled scrubbing and deep
scrubbing in order to reduce load on the cluster while things backfilled and
settled down. The PTSD caused by the incident meant I was not keen to
re-enable it until I was confident we had fixed the root cause of the issues
(driver issues with a new NIC type introduced with new hardware that did not
show up until production load hit them). My cluster is using Jewel 10.2.1,
and is a mix of SSD and SATA over 20 hosts, 352 OSDs in total.
Fast forward a few weeks and I was ready to re-enable it. On some reading I
was concerned the cluster might kick off excessive scrubbing once I unset the
flags, so I tried increasing the deep scrub interval from 7 days to 60 days -
with most of the last deep scrubs being from over a month before I was hoping
it would distribute them over the next 30 days. Having unset the flag and
carefully watched the cluster it seems to have just run a steady catch up
without significant impact. What I am noticing though is that the scrubbing is
seeming to just run through the full set of PGs, so it did some 2280 PGs last
night over 6 hours, and so far today in 12 hours another 4000 odd. With 13408
PGs, I am guessing that all this will stop some time early tomorrow.
ceph-glb-fec-01[/var/log]$ sudo ceph pg dump|awk '{print $20}'|grep
2017|sort|uniq -c
dumped all in format plain
5 2017-05-23
18 2017-05-24
33 2017-05-25
52 2017-05-26
89 2017-05-27
114 2017-05-28
144 2017-05-29
172 2017-05-30
256 2017-05-31
191 2017-06-01
230 2017-06-02
369 2017-06-03
606 2017-06-04
680 2017-06-05
919 2017-06-06
1261 2017-06-07
1876 2017-06-08
15 2017-06-09
2280 2017-07-05
4098 2017-07-06
My concern is am I now set to have all 13408 PGs do a deep scrub in 60 days in
a serial fashion again over 3 days. I would much rather they distribute over
that period.
Will the OSDs do this distribution themselves now they have caught up, or do I
need to say create a script that will trigger batches of PGs to deep scrub over
time to push out the distribution again?
Adrian Saul | Infrastructure Projects Team Lead
IT
T 02 9009 9041 | M +61 402 075 760
30 Ross St, Glebe NSW 2037
[email protected]<mailto:[email protected]> |
www.tpg.com.au<http://www.tpg.com.au/>
TPG Telecom (ASX: TPM)
[Description: http://res.tpgi.com.au/img/signature/tpgtelecomlogo.jpg]
This email and any attachments are confidential and may be subject to
copyright, legal or some other professional privilege. They are intended solely
for the attention and use of the named addressee(s). They may only be copied,
distributed or disclosed with the consent of the copyright owner. If you have
received this email by mistake or by breach of the confidentiality clause,
please notify the sender immediately by return email and delete or destroy all
copies of the email. Any confidentiality, privilege or copyright is not waived
or lost because this email has been sent to you by mistake.
Confidentiality: This email and any attachments are confidential and may be
subject to copyright, legal or some other professional privilege. They are
intended solely for the attention and use of the named addressee(s). They may
only be copied, distributed or disclosed with the consent of the copyright
owner. If you have received this email by mistake or by breach of the
confidentiality clause, please notify the sender immediately by return email
and delete or destroy all copies of the email. Any confidentiality, privilege
or copyright is not waived or lost because this email has been sent to you by
mistake.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com