I feel like this has been discussed multiple times on this list, I just don't have any links at hand. I suspect mclock settings, most likely some things changed between Quincy and Reef. You could fall back to wpq instead of mclock, it's still a general recommendation at the moment, and then see if anyting improves.

Zitat von Reed Dier via ceph-users <[email protected]>:

Hello all,

TL;DR is that the number of concurrent PGs scrubbing, deep or otherwise, has appeared to increase by about 5-10x, while the number of PGs complaining that they haven't been scrubbed, deep or otherwise, has continued to tick higher.
HEALTH_WARN 217 pgs not deep-scrubbed in time; 187 pgs not scrubbed in time


Hoping there may be something that I must have missed in release notes and mailing list that explains why my scrubs both exploded in concurrency, as well as fell behind after upgrading from quincy (17.2.9) to reef (18.2.8).

Non-cephadm, U22.04, rather heterogenous OSD hardware.
Mix of 8T and 2T HDD, as well as 2T SSD.
HDD's have NVMe WAL/DB of various sizes depending on when they were deployed.
Mix of replicated and EC pools, as well as some replicated pools across different device classes.

The vast majority of the PG's that are behind on scrubbing are on EC pools, and the vast majority of that, is our EC82 cephfs pool (40) that holds the bulk of our stored data, and the other largest pool is an older EC73 cephfs pool (37).
My quick and dirty approximation based on PGs last scrubbed last month.

ceph pg dump | grep 2026-05 | awk '{print $1" "$27}' | grep -v periodic | cut -d '.' -f1 | sort | uniq -c
dumped all
      1 17
      1 20
    116 37
    224 40


I didn't make any changes to scrub intervals or mclock profiles before/during/after the upgrade.
ceph config dump | grep mclock_profile | awk '{print $4}' | uniq -c ;
    313 balanced
ceph config dump | grep scrub_interval
global class:ssd advanced osd_deep_scrub_interval 604800.000000 mon advanced osd_deep_scrub_interval 604800.000000 mon.* advanced osd_deep_scrub_interval 604800.000000 mgr.* advanced osd_deep_scrub_interval 604800.000000 osd class:hdd advanced osd_deep_scrub_interval 604800.000000 osd class:ssd advanced osd_deep_scrub_interval 604800.000000 osd advanced osd_deep_scrub_interval 604800.000000 osd.* advanced osd_deep_scrub_interval 604800.000000

I've tried ceph tell osd.$osd osd_max_scrubs $more, which seems to somewhat momentarily drive the count of active+clean+scrubbing[+deep] PGs, but doesn't seem to make a demonstrative difference in terms of getting ahead in the number of PGs behind (number continues to grow). I also looked at load15 across OSD hosts, and they don't appear to be anywhere near the 50% threshold of osd_scrub_load_threshold either, so I think I can rule that one out for now.

I'm mostly curious why the change in behavior of concurrent scrubs ballooning, and yet the number of PGs behind on scrubbing ballooning as well, without anything actually changing. And I'm also curious what tunables I can turn to get things back under control for scrubbing both short and long term as I look towards getting to squid and 24.04. Is there an internal mechanism that triggers a deeper scrub during first deep scrub after upgrading a major release, reef or otherwise?

Included some graphs of scrub load over the last 60 and 365 day period to show prior scrub load that only exceedingly rarely ever generated a PG_NOT_[DEEP_]SCRUBBED warning, as well as raw load average (smallest cpu count is 16, and it doesn't even autoscale to 8, so nothing should be complaining there.)
https://imgur.com/a/rixNrCe

Appreciate any pointers anyone can steer me towards.
Reed


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to