Hello all,

TL;DR is that the number of concurrent PGs scrubbing, deep or otherwise, has 
appeared to increase by about 5-10x, while the number of PGs complaining that 
they haven't been scrubbed, deep or otherwise, has continued to tick higher.
> HEALTH_WARN 217 pgs not deep-scrubbed in time; 187 pgs not scrubbed in time


Hoping there may be something that I must have missed in release notes and 
mailing list that explains why my scrubs both exploded in concurrency, as well 
as fell behind after upgrading from quincy (17.2.9) to reef (18.2.8).

Non-cephadm, U22.04, rather heterogenous OSD hardware.
Mix of 8T and 2T HDD, as well as 2T SSD.
HDD's have NVMe WAL/DB of various sizes depending on when they were deployed.
Mix of replicated and EC pools, as well as some replicated pools across 
different device classes.

The vast majority of the PG's that are behind on scrubbing are on EC pools, and 
the vast majority of that, is our EC82 cephfs pool (40) that holds the bulk of 
our stored data, and the other largest pool is an older EC73 cephfs pool (37).
My quick and dirty approximation based on PGs last scrubbed last month.

> ceph pg dump | grep 2026-05 | awk '{print $1" "$27}' | grep -v periodic | cut 
> -d '.' -f1 | sort | uniq -c
> dumped all
>       1 17
>       1 20
>     116 37
>     224 40


I didn't make any changes to scrub intervals or mclock profiles 
before/during/after the upgrade.
> ceph config dump | grep mclock_profile | awk '{print $4}' | uniq -c ; 
>     313 balanced
> ceph config dump | grep scrub_interval
> global          class:ssd  advanced  osd_deep_scrub_interval                 
> 604800.000000
> mon                        advanced  osd_deep_scrub_interval                 
> 604800.000000
> mon.*                      advanced  osd_deep_scrub_interval                 
> 604800.000000
> mgr.*                      advanced  osd_deep_scrub_interval                 
> 604800.000000
> osd             class:hdd  advanced  osd_deep_scrub_interval                 
> 604800.000000
> osd             class:ssd  advanced  osd_deep_scrub_interval                 
> 604800.000000
> osd                        advanced  osd_deep_scrub_interval                 
> 604800.000000
> osd.*                      advanced  osd_deep_scrub_interval                 
> 604800.000000

I've tried ceph tell osd.$osd osd_max_scrubs $more, which seems to somewhat 
momentarily drive the count of active+clean+scrubbing[+deep] PGs, but doesn't 
seem to make a demonstrative difference in terms of getting ahead in the number 
of PGs behind (number continues to grow).
I also looked at load15 across OSD hosts, and they don't appear to be anywhere 
near the 50% threshold of osd_scrub_load_threshold either, so I think I can 
rule that one out for now.

I'm mostly curious why the change in behavior of concurrent scrubs ballooning, 
and yet the number of PGs behind on scrubbing ballooning as well, without 
anything actually changing.
And I'm also curious what tunables I can turn to get things back under control 
for scrubbing both short and long term as I look towards getting to squid and 
24.04.
Is there an internal mechanism that triggers a deeper scrub during first deep 
scrub after upgrading a major release, reef or otherwise?

Included some graphs of scrub load over the last 60 and 365 day period to show 
prior scrub load that only exceedingly rarely ever generated a 
PG_NOT_[DEEP_]SCRUBBED warning,
as well as raw load average (smallest cpu count is 16, and it doesn't even 
autoscale to 8, so nothing should be complaining there.)
https://imgur.com/a/rixNrCe

Appreciate any pointers anyone can steer me towards.
Reed


Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to