On 28/06/2023 21:26, Niklas Hambüchen wrote:
I have increased the number of scrubs per OSD from 1 to 3 using `ceph config
set osd osd_max_scrubs 3`.
Now the problematic PG is scrubbing in `ceph pg ls`:
active+clean+scrubbing+deep+inconsistent
This succeeded!
The deep-scrub fixed the PG and the cluster is healthy again.
Thanks a lot!
So indeed the issue was that the deep-scrub I had asked for was simply never
scheduled because Ceph always picked some other scrub to do first on the
relevant OSD.
Increasing `osd_max_scrubs` beyond 1 made it possible to force the scrub to
start.
I conclude that most of the information online, including the Ceph docs, does
not give the correct advice when recommending `ceph pg repair`.
Instead, the docs should make clear that a scrub will fix such issues without
involvement of `ceph pg repair`.
I find lack of docs disturbing, because a disk failing and being replaced is an
extremely common operation for storage cluster.
Including some relevant logs of the scrub recovery:
# grep '\b2\.87\b' /var/log/ceph/ceph-osd.33.log | grep deep
2023-05-16T16:33:58.398+0000 7f9a985e5640 0 log_channel(cluster) log [DBG]
: 2.87 deep-scrub ok
2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR]
: 2.87 deep-scrub 0 missing, 1 inconsistent objects
2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR]
: 2.87 deep-scrub 1 errors
2023-06-26T05:06:17.412+0000 7f9b15bfe640 0 log_channel(cluster) log [INF]
: osd.33 pg 2.87 Deep scrub errors, upgrading scrub to deep-scrub
2023-06-29T10:14:07.791+0000 7f9a985e5640 0 log_channel(cluster) log [DBG]
: 2.87 deep-scrub ok
ceph.log:
2023-06-29T10:14:07.792432+0000 osd.33 (osd.33) 938 : cluster [DBG] 2.87
deep-scrub ok
2023-06-29T10:14:09.311257+0000 mgr.node-5 (mgr.2454216) 385434 : cluster
[DBG] pgmap v385836: 832 pgs: 1 active+clean+scrubbing, 17
active+clean+scrubbing+deep, 814 active+clean; 68 TiB data, 210 TiB used, 229
TiB / 439 TiB avail; 80 MiB/s rd, 40 MiB/s wr, 45 op/s
2023-06-29T10:14:09.427733+0000 mon.node-4 (mon.0) 20923054 : cluster [INF]
Health check cleared: OSD_SCRUB_ERRORS (was: 1 scrub errors)
2023-06-29T10:14:09.427758+0000 mon.node-4 (mon.0) 20923055 : cluster [INF]
Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2023-06-29T10:14:09.427786+0000 mon.node-4 (mon.0) 20923056 : cluster [INF]
Cluster is now healthy
From this, it seems bad that Ceph did not manage to schedule the cluster-fixing
scrub within 7 days of the faulty disk being replaced, nor managed to schedule
a human-requested scrub within 2 days.
What mechanism in Ceph decides the scheduling of scrubs?
I see the config value `osd_requested_scrub_priority` which is for "the priority set
for user requested scrub on the work queue", but I cannot tell if this also affects
scrub start scheduling, or only the priority of IO operations vs e.g. client operations
once a scrub has already been started.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]