[ceph-users] Re: 1 pg inconsistent and does not recover

Niklas Hambüchen Thu, 29 Jun 2023 05:12:14 -0700

On 28/06/2023 21:26, Niklas Hambüchen wrote:

I have increased the number of scrubs per OSD from 1 to 3 using `ceph config 
set osd osd_max_scrubs 3`.
Now the problematic PG is scrubbing in `ceph pg ls`:


     active+clean+scrubbing+deep+inconsistent


This succeeded!
The deep-scrub fixed the PG and the cluster is healthy again.

Thanks a lot!

So indeed the issue was that the deep-scrub I had asked for was simply never 
scheduled because Ceph always picked some other scrub to do first on the 
relevant OSD.
Increasing `osd_max_scrubs` beyond 1 made it possible to force the scrub to 
start.

I conclude that most of the information online, including the Ceph docs, does 
not give the correct advice when recommending `ceph pg repair`.
Instead, the docs should make clear that a scrub will fix such issues without 
involvement of `ceph pg repair`.

I find lack of docs disturbing, because a disk failing and being replaced is an 
extremely common operation for storage cluster.

Including some relevant logs of the scrub recovery:

    # grep '\b2\.87\b' /var/log/ceph/ceph-osd.33.log | grep deep
    2023-05-16T16:33:58.398+0000 7f9a985e5640  0 log_channel(cluster) log [DBG] 
: 2.87 deep-scrub ok
    2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR] 
: 2.87 deep-scrub 0 missing, 1 inconsistent objects
    2023-06-16T20:03:26.923+0000 7f9a985e5640 -1 log_channel(cluster) log [ERR] 
: 2.87 deep-scrub 1 errors
    2023-06-26T05:06:17.412+0000 7f9b15bfe640  0 log_channel(cluster) log [INF] 
: osd.33 pg 2.87 Deep scrub errors, upgrading scrub to deep-scrub
    2023-06-29T10:14:07.791+0000 7f9a985e5640  0 log_channel(cluster) log [DBG] 
: 2.87 deep-scrub ok

ceph.log:

    2023-06-29T10:14:07.792432+0000 osd.33 (osd.33) 938 : cluster [DBG] 2.87 
deep-scrub ok
    2023-06-29T10:14:09.311257+0000 mgr.node-5 (mgr.2454216) 385434 : cluster 
[DBG] pgmap v385836: 832 pgs: 1 active+clean+scrubbing, 17 
active+clean+scrubbing+deep, 814 active+clean; 68 TiB data, 210 TiB used, 229 
TiB / 439 TiB avail; 80 MiB/s rd, 40 MiB/s wr, 45 op/s
    2023-06-29T10:14:09.427733+0000 mon.node-4 (mon.0) 20923054 : cluster [INF] 
Health check cleared: OSD_SCRUB_ERRORS (was: 1 scrub errors)
    2023-06-29T10:14:09.427758+0000 mon.node-4 (mon.0) 20923055 : cluster [INF] 
Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
    2023-06-29T10:14:09.427786+0000 mon.node-4 (mon.0) 20923056 : cluster [INF] 
Cluster is now healthy

From this, it seems bad that Ceph did not manage to schedule the cluster-fixing 
scrub within 7 days of the faulty disk being replaced, nor managed to schedule 
a human-requested scrub within 2 days.

What mechanism in Ceph decides the scheduling of scrubs?

I see the config value `osd_requested_scrub_priority` which is for "the priority set 
for user requested scrub on the work queue", but I cannot tell if this also affects 
scrub start scheduling, or only the priority of IO operations vs e.g. client operations 
once a scrub has already been started.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: 1 pg inconsistent and does not recover

Reply via email to