Hi,
First of all I would suggest upgrading your cluster on one of the supported
releases.
I think full recovery is recommended to get back the mds.
1. Stop the mdses and all the clients.
2. Fail the fs.
a. ceph fs fail <fs name>
3. Backup the journal: (If the below command fails, make rados level copy
using http://tracker.ceph.com/issues/9902). Since the mds is corrupted, we
can skip this too ?
# cephfs-journal-tool journal export backup.bin
4. Cleanup up ancillary data generated during if any previous recovery.
# cephfs-data-scan cleanup [<data pool>]
5. Recover_dentries, reset session, and reset_journal:
# cephfs-journal-tool --rank <fsname>:0 event recover_dentries list
# cephfs-table-tool <fsname>:all reset session
# cephfs-journal-tool --rank <fsname>:0 journal reset
6. Execute scan_extents on each of the x4 tools pods in parallel:
# cephfs-data-scan scan_extents --worker_n 0 --worker_m 4 --filesystem
<fsname> <data-pool>
# cephfs-data-scan scan_extents --worker_n 1 --worker_m 4 --filesystem
<fsname> <data-pool>
# cephfs-data-scan scan_extents --worker_n 2 --worker_m 4 --filesystem
<fsname> <data-pool>
# cephfs-data-scan scan_extents --worker_n 3 --worker_m 4 --filesystem
<fsname> <data-pool>
7. Execute scan_inodes on each of the x4 tools pods in parallel:
# cephfs-data-scan scan_inodes --worker_n 0 --worker_m 4 --filesystem
<fsname> <data-pool>
# cephfs-data-scan scan_inodes --worker_n 1 --worker_m 4 --filesystem
<fsname> <data-pool>
# cephfs-data-scan scan_inodes --worker_n 2 --worker_m 4 --filesystem
<fsname> <data-pool>
# cephfs-data-scan scan_inodes --worker_n 3 --worker_m 4 --filesystem
<fsname> <data-pool>
8. scan_links:
# cephfs-data-scan scan_links --filesystem <fsname>
9. Mark the filesystem joinable from pod/rook-ceph-tools:
# ceph fs set <fsname> joinable true
10. Startup MDSs
11. Scrub online fs
# ceph tell mds.<fsname>-<active-mds[a|b]> scrub start / recursive
repair
12. Check scrub status:
# ceph tell mds.<fsname>-{pick-active-mds| a or b} scrub status
For more information please look into
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/
Thanks,
Kotresh H R
On Wed, Apr 26, 2023 at 3:08 AM <[email protected]> wrote:
> Hi All,
>
> We have a CephFS cluster running Octopus with three control nodes each
> running an MDS, Monitor, and Manager on Ubuntu 20.04. The OS drive on one
> of these nodes failed recently and we had to do a fresh install, but made
> the mistake of installing Ubuntu 22.04 where Octopus is not available. We
> tried to force apt to use the Ubuntu 20.04 repo when installing Ceph so
> that it would install Octopus, but for some reason Quincy was still
> installed. We re-integrated this node and it seemed to work fine for about
> a week until our cluster reported damage to an MDS daemon and placed our
> filesystem into a degraded state.
>
> cluster:
> id: 692905c0-f271-4cd8-9e43-1c32ef8abd13
> health: HEALTH_ERR
> mons are allowing insecure global_id reclaim
> 1 filesystem is degraded
> 1 filesystem is offline
> 1 mds daemon damaged
> noout flag(s) set
> 161 scrub errors
> Possible data damage: 24 pgs inconsistent
> 8 pgs not deep-scrubbed in time
> 4 pgs not scrubbed in time
> 6 daemons have recently crashed
>
> services:
> mon: 3 daemons, quorum database-0,file-server,webhost (age 12d)
> mgr: database-0(active, since 4w), standbys: webhost, file-server
> mds: cephfs:0/1 3 up:standby, 1 damaged
> osd: 91 osds: 90 up (since 32h), 90 in (since 5M)
> flags noout
>
> task status:
>
> data:
> pools: 7 pools, 633 pgs
> objects: 169.18M objects, 640 TiB
> usage: 883 TiB used, 251 TiB / 1.1 PiB avail
> pgs: 605 active+clean
> 23 active+clean+inconsistent
> 4 active+clean+scrubbing+deep
> 1 active+clean+scrubbing+deep+inconsistent
>
> We are not sure if the Quincy/Octopus version mismatch is the problem, but
> we are in the process of downgrading this node now to ensure all nodes are
> running Octopus. Before doing that, we ran the following commands to try
> and recover:
>
> $ cephfs-journal-tool --rank=cephfs:all journal export backup.bin
>
> $ sudo cephfs-journal-tool --rank=cephfs:all event recover_dentries
> summary:
>
> Events by type:
> OPEN: 29589
> PURGED: 1
> SESSION: 16
> SESSIONS: 4
> SUBTREEMAP: 127
> UPDATE: 70438
> Errors: 0
>
> $ cephfs-journal-tool --rank=cephfs:0 journal reset:
>
> old journal was 170234219175~232148677
> new journal start will be 170469097472 (2729620 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
>
> $ cephfs-table-tool all reset session
>
> All of our MDS daemons are down and fail to restart with the following
> errors:
>
> -3> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log
> [ERR] : journal replay alloc 0x1000053af79 not in free
> [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128]
> -2> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster)
> log [ERR] : journal replay alloc
> [0x1000053af7a~0x1eb,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2],
> only
> [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2]
> is in free
> [0x1000053af7d~0x1e8,0x1000053b35c~0x1f7,0x1000053b555~0x2,0x1000053b559~0x2,0x1000053b55d~0x2,0x1000053b561~0x2,0x1000053b565~0x1de,0x1000053b938~0x1fd,0x1000053bd2a~0x4,0x1000053bf23~0x4,0x1000053c11c~0x4,0x1000053cd7b~0x158,0x1000053ced8~0xffffac3128]
> -1> 2023-04-20T10:25:15.072-0700 7f0465069700 -1
> /build/ceph-15.2.15/src/mds/journal.cc: In function 'void
> EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread
> 7f0465069700 time 2023-04-20T10:25:15.076784-0700
> /build/ceph-15.2.15/src/mds/journal.cc: 1513: FAILED ceph_assert(inotablev
> == mds->inotable->get_version())
>
> ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus
> (stable)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x155) [0x7f04717a3be1]
> 2: (()+0x26ade9) [0x7f04717a3de9]
> 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2)
> [0x560feaca36f2]
> 4: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
> 5: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
> 6: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
> 7: (()+0x8609) [0x7f0471318609]
> 8: (clone()+0x43) [0x7f0470ee9163]
>
> 0> 2023-04-20T10:25:15.076-0700 7f0465069700 -1 *** Caught signal
> (Aborted) **
> in thread 7f0465069700 thread_name:md_log_replay
>
> ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus
> (stable)
> 1: (()+0x143c0) [0x7f04713243c0]
> 2: (gsignal()+0xcb) [0x7f0470e0d03b]
> 3: (abort()+0x12b) [0x7f0470dec859]
> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1b0) [0x7f04717a3c3c]
> 5: (()+0x26ade9) [0x7f04717a3de9]
> 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2)
> [0x560feaca36f2]
> 7: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
> 8: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
> 9: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
> 10: (()+0x8609) [0x7f0471318609]
> 11: (clone()+0x43) [0x7f0470ee9163]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> At this point, we decided it's best to ask for some guidance before
> issuing any other recovery commands.
>
> Can anyone advise what we should do?
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]