[ceph-users] Replace OSD while cluster is recovering?

grondina Fri, 28 Feb 2025 10:58:07 -0800

Hello list,

We have a Ceph cluster (v17.2.6 quincy) with 3 admin nodes and 6 storage nodes, 
each connected to a JBOD enclosure. Each enclosure houses 28 HDD disks of size 
18 TB, totaling 168 OSDs. The pool that houses the majority of the data is 
erasure-coded (4+2). We have recently had one disk failure, which brought one 
OSD down:


# ceph osd tree | grep down
  2    hdd    16.49579          osd.2         down         0  1.00000

This OSD is out of the cluster, but we haven't replaced it physically yet. The 
problem that we are facing is that the cluster was not in the best shape when 
this OSD failed. Currently we have the following:

################
  cluster:
    id:     <redacted>
    health: HEALTH_ERR
            1026 scrub errors
            Possible data damage: 18 pgs inconsistent
            2122 pgs not deep-scrubbed in time
            2122 pgs not scrubbed in time

  services:
    mon: 5 daemons, quorum xyz-admin1,xyz-admin2,xyz-osd1,xyz-osd2,xyz-osd3 
(age 17M)
    mgr: xyz-admin2.sipadf(active, since 17M), standbys: xyz-admin1.nwaovh
    mds: 2/2 daemons up, 2 standby
    osd: 168 osds: 167 up (since 40h), 167 in (since 6w); 226 remapped pgs

  data:
    volumes: 2/2 healthy
    pools:   9 pools, 2122 pgs
    objects: 448.54M objects, 1.0 PiB
    usage:   1.6 PiB used, 1.1 PiB / 2.7 PiB avail
    pgs:     133905796/2676514497 objects misplaced (5.003%)
             1880 active+clean
             201  active+remapped+backfilling
             23   active+remapped+backfill_wait
             16   active+clean+inconsistent
             1    active+remapped+inconsistent+backfill_wait
             1    active+remapped+inconsistent+backfilling

  io:
    recovery: 703 MiB/s, 281 objects/s

  progress:
    Global Recovery Event (6w)
      [=========================...] (remaining: 5d)
################

I have noticed the number of active+clean increasing, and objects misplaced 
very slowly decreasing. My question is, should I wait until recovery is 
complete, repair the 18 damaged pg, and then replace the disk? My thinking is 
that replacing the disk will trigger more backfilling which will slow down the 
recovering even more.

Another question, should I disable scrubbing while the recovery is not 
finalized?

Thank you for any insights you may be able to provide!
-
Gustavo
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Replace OSD while cluster is recovering?

Reply via email to