[ceph-users] Re: increasing number of (deep) scrubs

Szabo, Istvan (Agoda) Tue, 12 Dec 2023 21:14:28 -0800

Hi,

You are on octopus right?

Istvan Szabo
Staff Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---------------------------------------------------

________________________________
From: Frank Schilder <fr...@dtu.dk>
Sent: Tuesday, December 12, 2023 7:33 PM
To: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: [ceph-users] Re: increasing number of (deep) scrubs

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !
________________________________

Hi all,

if you follow this thread, please see the update in "How to configure something 
like osd_deep_scrub_min_interval?" 
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YUHWQCDAKP5MPU6ODTXUSKT7RVPERBJF/).
 I found out how to tune the scrub machine and I posted a quick update in the 
other thread, because the solution was not to increase the number of scrubs, 
but to tune parameters.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder
Sent: Monday, January 9, 2023 9:14 AM
To: Dan van der Ster
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] increasing number of (deep) scrubs

Hi Dan,

thanks for your answer. I don't have a problem with increasing osd_max_scrubs 
(=1 at the moment) as such. I would simply prefer a somewhat finer grained way 
of controlling scrubbing than just doubling or tripling it right away.

Some more info. These 2 pools are data pools for a large FS. Unfortunately, we 
have a large percentage of small files, which is a pain for recovery and 
seemingly also for deep scrubbing. Our OSDs are about 25% used and I had to 
increase the warning interval already to 2 weeks. With all the warning grace 
parameters this means that we manage to deep scrub everything about every 
month. I need to plan for 75% utilisation and a 3 months period is a bit far on 
the risky side.

Our data is to a large percentage cold data. Client reads will not do the check 
for us, we need to combat bit-rot pro-actively.

The reasons I'm interested in parameters initiating more scrubs while also 
converting more scrubs into deep scrubs are, that

1) scrubs seem to complete very fast. I almost never catch a PG in state 
"scrubbing", I usually only see "deep scrubbing".

2) I suspect the low deep-scrub count is due to a low number of deep-scrubs 
scheduled and not due to conflicting per-OSD deep scrub reservations. With the 
OSD count we have and the distribution over 12 servers I would expect at least 
a peak of 50% OSDs being active in scrubbing instead of the 25% peak I'm seeing 
now. It ought to be possible to schedule more PGs for deep scrub than actually 
are.

3) Every OSD having only 1 deep scrub active seems to have no measurable impact 
on user IO. If I could just get more PGs scheduled with 1 deep scrub per OSD it 
would already help a lot. Once this is working, I can eventually increase 
osd_max_scrubs when the OSDs fill up. For now I would just like that (deep) 
scrub scheduling looks a bit harder and schedules more eligible PGs per time 
unit.

If we can get deep scrubbing up to an average of 42PGs completing per hour with 
keeping osd_max_scrubs=1 to maintain current IO impact, we should be able to 
complete a deep scrub with 75% full OSDs in about 30 days. This is the current 
tail-time with 25% utilisation. I believe currently a deep scrub of a PG in 
these pools takes 2-3 hours. Its just a gut feeling from some repair and 
deep-scrub commands, I would need to check logs for more precise info.

Increasing osd_max_scrubs would then be a further and not the only option to 
push for more deep scrubbing. My expectation would be that values of 2-3 are 
fine due to the increasingly higher percentage of cold data for which no 
interference with client IO will happen.

Hope that makes sense and there is a way beyond bumping osd_max_scrubs to 
increase the number of scheduled and executed deep scrubs.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dvand...@gmail.com>
Sent: 05 January 2023 15:36
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] increasing number of (deep) scrubs

Hi Frank,

What is your current osd_max_scrubs, and why don't you want to increase it?
With 8+2, 8+3 pools each scrub is occupying the scrub slot on 10 or 11
OSDs, so at a minimum it could take 3-4x the amount of time to scrub
the data than if those were replicated pools.
If you want the scrub to complete in time, you need to increase the
amount of scrub slots accordingly.

On the other hand, IMHO the 1-week deadline for deep scrubs is often
much too ambitious for large clusters -- increasing the scrub
intervals is one solution, or I find it simpler to increase
mon_warn_pg_not_scrubbed_ratio and mon_warn_pg_not_deep_scrubbed_ratio
until you find a ratio that works for your cluster.
Of course, all of this can impact detection of bit-rot, which anyway
can be covered by client reads if most data is accessed periodically.
But if the cluster is mostly idle or objects are generally not read,
then it would be preferable to increase slots osd_max_scrubs.

Cheers, Dan

On Tue, Jan 3, 2023 at 2:30 AM Frank Schilder <fr...@dtu.dk> wrote:
>
> Hi all,
>
> we are using 16T and 18T spinning drives as OSDs and I'm observing that they 
> are not scrubbed as often as I would like. It looks like too few scrubs are 
> scheduled for these large OSDs. My estimate is as follows: we have 852 
> spinning OSDs backing a 8+2 pool with 2024 and an 8+3 pool with 8192 PGs. On 
> average I see something like 10PGs of pool 1 and 12 PGs of pool 2 (deep) 
> scrubbing. This amounts to only 232 out of 852 OSDs scrubbing and seems to be 
> due to a conservative rate of (deep) scrubs being scheduled. The PGs (dep) 
> scrub fairly quickly.
>
> I would like to increase gently the number of scrubs scheduled for these 
> drives and *not* the number of scrubs per OSD. I'm looking at parameters like:
>
> osd_scrub_backoff_ratio
> osd_deep_scrub_randomize_ratio
>
> I'm wondering if lowering osd_scrub_backoff_ratio to 0.5 and, maybe, 
> increasing osd_deep_scrub_randomize_ratio to 0.2 would have the desired 
> effect? Are there other parameters to look at that allow gradual changes in 
> the number of scrubs going on?
>
> Thanks a lot for your help!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

________________________________
This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: increasing number of (deep) scrubs

Reply via email to