Hello Michel,

I would not get worried unless pg repair completes (can take hours) but the pg is still marked as inconsistent afterwards. Naive question: Have you actually checked the failing drive is the one you removed? There should be a line in ceph log going as "cluster [ERR] <pg_number> shard <osd_id>".

Cheers,
Enrico


On 9/18/25 12:39, Janne Johansson wrote:
We experienced one week ago problems with a dying OSD resulting in
OSD_TOO_MANY_REPAIRS, unfortunately unnoticed (it seems our monitoring
system is not notifying these errors properly). When we realized the
error, we removed the problematic OSD (ceph orch osd rm --replace),
despite the scrubbing errors: the resulting backfills succeeded but did
not fix the scrub errors. The collegue who took care of this problem
decided to lauch a `ceph pg repair` on the 3 PGs with reported
inconsistencies but it doesn't seem to converge. 'ceph -s' still reports :

               3    active+clean+scrubbing+deep+inconsistent+repair

after a few hours and for at least one of the PG, there is the following
message every 3s:

Sep 18 12:30:55 ceph-76212 ceph-mon[2506]: osd.72 pg 11.e2d Deep scrub
errors, upgrading scrub to deep-scrub

Not sure if it is the sign of a problem or just because the operation is
ongoing. I'm looking for advices on what to do to move forward. There
was not yet a report from users of an impact of this but it doesn't mean
there is none... The affected pool is storing RBD volumes (from
OpenStack Cinder).
Can't say how long it should take, but repairs can take a while. For
me, it's usually takes a long while until the
"active+clean+scrubbing+deep+inconsistent+repair" status appears, then
I guess it is dependant on disk (and possibly wpq -vs- mclock?) perf.
I would stay calm for a while to let the cluster try to get itself
right.


--
Enrico Bocchi
CERN European Laboratory for Particle Physics
IT - Storage & Data Management  - General Storage Services
Mailbox: G20500 - Office: 31-2-010
1211 Genève 23
Switzerland
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to