I can also add the following -

It's not an OOM issue.
The MDS host has 256 GB RAM
mds_cache_memory_limit is just below 39 GB

No oom-killer in the logs.

ceph fs dump returns:

Filesystem '<NAME-REDACTED>' (2)
fs_name     <NAME-REDACTED>
epoch 4402521
flags 12 joinable allow_snaps allow_multimds_snaps
created     2022-03-21T08:22:52.262710+0000
modified    2025-05-19T16:05:17.384954+0000
tableserver 0
root  0
session_timeout   60
session_autoclose 600
max_file_size     4398046511104
max_xattr_size    65536
required_client_features      {}
last_failure      0
last_failure_osd_epoch  2046148
compat      compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no 
anchor table,9=file layout v2,10=snaprealm v2}
max_mds     1
in    0
up    {0=2125835186}
failed
damaged
stopped
data_pools  [45]
metadata_pool     44
inline_data disabled
balancer
bal_rank_mask     -1
standby_count_wanted    1


So based on the Failed States description - 
https://docs.ceph.com/en/reef/cephfs/mds-states/#failed-states

I understand that the MDS at least did not report damaged/failed meta-data - 
But that might be an optimistic intrepertation?

________________________________
From: Kasper Rasmussen <kasper_steenga...@hotmail.com>
Sent: Tuesday, May 20, 2025 09:02
To: Eugen Block <ebl...@nde.ag>; Alexander Patrakov <patra...@gmail.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: Re: [ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to 
get CephFS Active

I haven't tried any disaster recovery out yet. However I've found this bug that 
looks like the issue.

https://tracker.ceph.com/issues/61009

Seems like it's still open, and might have gone stale - Can anyone comment on 
that in this channel?

________________________________
From: Eugen Block <ebl...@nde.ag>
Sent: Tuesday, May 20, 2025 08:01
To: Alexander Patrakov <patra...@gmail.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: [ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to get 
CephFS Active

Hi,

I don't think I've had to use a journal backup yet. Either the backup
of the journal failed because it was corrupted, or the disaster
recovery procedure worked out.
But assume that you would need to import the backup:

cephfs-journal-tool [options] journal import <path> [--force]

and then retry to recover the FS. But I can't remember either if
anyone on this list has reported to successfully restore the journal
from backup and then succefully recovered the FS in a second attempt.


Zitat von Alexander Patrakov <patra...@gmail.com>:

> Hi Eugen,
>
> I have never seen any instructions on how to use such a backup if
> disaster recovery fails. Do you know the procedure?
>
> On Tue, May 20, 2025 at 1:23 AM Eugen Block <ebl...@nde.ag> wrote:
>>
>> Hi,
>>
>> not sure if it was related to journal replay, but have you checked for
>> memory issues? What's the mds memory target? Any traces of an oom
>> killer?
>>
>> Next I would do is inspect the journals for both purge_queue and md_log:
>>
>> cephfs-journal-tool journal inspect --rank=<cephfs> --journal=md_log
>> cephfs-journal-tool journal inspect --rank=<cephfs> --journal=purge_queue
>>
>> The --rank and --journal parameters might be in the wrong place here,
>> I'm writing this without immediate access to a cephfs-journal-tool.
>>
>> In case the journals are okay, create a backup as described in the
>> docs [0]. Then you might have to go through the disaster recovery
>> steps (for this cephfs only).
>>
>> [0] https://docs.ceph.com/en/latest/cephfs/disaster-recovery/
>>
>> Zitat von Kasper Rasmussen <kasper_steenga...@hotmail.com>:
>>
>> > Ceph Version: 18.2.7
>> >
>> > I've just migrated to cephadm, and upgrade from pacific to reef
>> > 18.2.7 last week.
>> > All successful except some minor issues with BlueFS Spillover
>> >
>> >
>> > Today the MDS of a specific fs refuse to start, and the ceph orch ps
>> > shows the daemons with status "error".
>> > I have three other cephfs that still works(though I haven't tested
>> > if they can fail over.)
>> >
>> > I've restartet the MDSs - No luck (the selected MDS just start/crash
>> > in a loop until it gives up)
>> > I've deployed 2 new MDSs - No luck same issue
>> >
>> > In all scenarios I see in ceph fs status, that a MDS is chosen. FS
>> > status goes to "replay" or "replay(laggy)"
>> > On the host with the MDS I see the MDS container just crashes after
>> > way less than 5 mins.. And status reported by ceph orch ps is error.
>> >
>> > (btw - mds_beacon_grace has been set to 360)
>> >
>> > I've managed to get a good 500 lines of log out with info like this:
>> >
>> > << ----------------- LOG EXAMPLE START ----------------- >>
>> >     -7> 2025-05-19T16:05:02.840+0000 7f6739bb8640 10 monclient:
>> > _check_auth_tickets
>> >     -6> 2025-05-19T16:05:02.840+0000 7f6739bb8640 10 monclient:
>> > _check_auth_rotating have uptodate secrets (they expire after
>> > 2025-05-19T16:04:32.845551+0000)
>> >     -5> 2025-05-19T16:05:02.860+0000 7f673e3c1640 10 monclient:
>> > get_auth_request con 0x5616e9616c00 auth_method 0
>> >     -4> 2025-05-19T16:05:02.916+0000 7f673dbc0640 10 monclient:
>> > get_auth_request con 0x5616e7422800 auth_method 0
>> >     -3> 2025-05-19T16:05:02.968+0000 7f673d3bf640 10 monclient:
>> > get_auth_request con 0x5616f5eac800 auth_method 0
>> >     -2> 2025-05-19T16:05:02.972+0000 7f6736bb2640  2 mds.0.cache
>> > Memory usage:  total 574800, rss 343772, heap 207124, baseline
>> > 182548, 0 / 7535 inodes have caps, 0 caps, 0 caps per inode
>> >     -1> 2025-05-19T16:05:03.676+0000 7f67333ab640 -1
>> >
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.7/rpm/el9/BUILD/ceph-18.2.7/src/include/interval_set.h:
>>  In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, 
>> T)>) [with T = inodeno_t; C = std::map]' thread 7f67333ab640
>> time
>> > 2025-05-19T16:05:03.680495+0000
>> >
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.7/rpm/el9/BUILD/ceph-18.2.7/src/include/interval_set.h:
>>  568: FAILED ceph_assert(p->first
>> <=
>> > start)
>> >
>> >  ceph version 18.2.7 (6b0e988052ec84cf2d4a54ff9bbbc5e720b621ad)
>> reef (stable)
>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x11e) [0x7f67406e6d2c]
>> >  2: /usr/lib64/ceph/libceph-common.so.2(+0x16beeb) [0x7f67406e6eeb]
>> >  3: /usr/bin/ceph-mds(+0x1f16fe) [0x5616e04d46fe]
>> >  4: /usr/bin/ceph-mds(+0x1f1745) [0x5616e04d4745]
>> >  5: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
>> > MDPeerUpdate*)+0x4bdc) [0x5616e0709a4c]
>> >  6: (EUpdate::replay(MDSRank*)+0x5d) [0x5616e0711afd]
>> >  7: (MDLog::_replay_thread()+0x75e) [0x5616e06bc02e]
>> >  8: /usr/bin/ceph-mds(+0x1404b1) [0x5616e04234b1]
>> >  9: /lib64/libc.so.6(+0x8a21a) [0x7f674009721a]
>> >  10: clone()
>> >
>> >      0> 2025-05-19T16:05:03.676+0000 7f67333ab640 -1 *** Caught
>> > signal (Aborted) **
>> >  in thread 7f67333ab640 thread_name:mds-log-replay
>> >
>> >  ceph version 18.2.7 (6b0e988052ec84cf2d4a54ff9bbbc5e720b621ad)
>> reef (stable)
>> >  1: /lib64/libc.so.6(+0x3ebf0) [0x7f674004bbf0]
>> >  2: /lib64/libc.so.6(+0x8bf5c) [0x7f6740098f5c]
>> >  3: raise()
>> >  4: abort()
>> >  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x178) [0x7f67406e6d86]
>> >  6: /usr/lib64/ceph/libceph-common.so.2(+0x16beeb) [0x7f67406e6eeb]
>> >  7: /usr/bin/ceph-mds(+0x1f16fe) [0x5616e04d46fe]
>> >  8: /usr/bin/ceph-mds(+0x1f1745) [0x5616e04d4745]
>> >  9: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
>> > MDPeerUpdate*)+0x4bdc) [0x5616e0709a4c]
>> >  10: (EUpdate::replay(MDSRank*)+0x5d) [0x5616e0711afd]
>> >  11: (MDLog::_replay_thread()+0x75e) [0x5616e06bc02e]
>> >  12: /usr/bin/ceph-mds(+0x1404b1) [0x5616e04234b1]
>> >  13: /lib64/libc.so.6(+0x8a21a) [0x7f674009721a]
>> >  14: clone()
>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to interpret this.
>> > << ----------------- LOG EXAMPLE END ----------------- >>
>> >
>> >
>> > But to be honest, out of all those lines, I don't know what to
>> > provide (all +500 might be a bit to much)
>> >
>> >
>> > I really need this FS back online, so help will be very much appreciated
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Alexander Patrakov


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to