Hi all,

we have a serious problem with CephFS. A few days ago, the CephFS file
systems became inaccessible, with the message MDS_DAMAGE: 1 mds daemon
damaged

The cephfs-journal-tool tells us: "Overall journal integrity: OK"

The usual attempts with redeploy were unfortunately not successful.

After many attempts to achieve something with the orchestrator, we set the
MDS to “failed” and provoked the creation of new MDS with “ceph fs reset”.

But this MDS crashes:
ceph-17.2.7/src/mds/MDCache.cc: In function 'void
MDCache::rejoin_send_rejoins()'
ceph-17.2.7/src/mds/MDCache.cc: 4086: FAILED ceph_assert(auth >= 0)

(The full trace is attached).

What can we do now? We are grateful for any help!
May 05 22:42:43 ceph06 bash[707251]: debug     -1> 2024-05-05T20:42:43.006+0000 
7f6892752700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDCache.cc:
 In function 'void MDCache::rejoin_send_rejoins()' thread 7f6892752700 time 
2024-05-05T20:42:43.008448+0000
May 05 22:42:43 ceph06 bash[707251]: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDCache.cc:
 4086: FAILED ceph_assert(auth >= 0)
May 05 22:42:43 ceph06 bash[707251]:  ceph version 17.2.7 
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
May 05 22:42:43 ceph06 bash[707251]:  1: (ceph::__ceph_assert_fail(char const*, 
char const*, int, char const*)+0x135) [0x7f689fb974a3]
May 05 22:42:43 ceph06 bash[707251]:  2: 
/usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f689fb97669]
May 05 22:42:43 ceph06 bash[707251]:  3: 
(MDCache::rejoin_send_rejoins()+0x216b) [0x5605d03da7eb]
May 05 22:42:43 ceph06 bash[707251]:  4: 
(MDCache::process_imported_caps()+0x1993) [0x5605d03d8353]
May 05 22:42:43 ceph06 bash[707251]:  5: 
(MDCache::rejoin_open_ino_finish(inodeno_t, int)+0x217) [0x5605d03e5837]
May 05 22:42:43 ceph06 bash[707251]:  6: (MDSContext::complete(int)+0x5f) 
[0x5605d05a7f4f]
May 05 22:42:43 ceph06 bash[707251]:  7: (void 
finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > 
>(ceph::common::CephContext*, std::vector<MDSContext*, 
std::allocator<MDSContext*> >&, int)+0x8d) [0x5605d024cf5d]
May 05 22:42:43 ceph06 bash[707251]:  8: (MDCache::open_ino_finish(inodeno_t, 
MDCache::open_ino_info_t&, int)+0x138) [0x5605d03cd168]
May 05 22:42:43 ceph06 bash[707251]:  9: 
(MDCache::_open_ino_traverse_dir(inodeno_t, MDCache::open_ino_info_t&, 
int)+0xbb) [0x5605d03cd4bb]
May 05 22:42:43 ceph06 bash[707251]:  10: (MDSContext::complete(int)+0x5f) 
[0x5605d05a7f4f]
May 05 22:42:43 ceph06 bash[707251]:  11: (MDSRank::_advance_queues()+0xaa) 
[0x5605d025b34a]
May 05 22:42:43 ceph06 bash[707251]:  12: 
(MDSRank::ProgressThread::entry()+0xb8) [0x5605d025b918]
May 05 22:42:43 ceph06 bash[707251]:  13: /lib64/libpthread.so.0(+0x81ca) 
[0x7f689eb861ca]
May 05 22:42:43 ceph06 bash[707251]:  14: clone()
May 05 22:42:43 ceph06 bash[707251]: debug      0> 2024-05-05T20:42:43.010+0000 
7f6892752700 -1 *** Caught signal (Aborted) **
May 05 22:42:43 ceph06 bash[707251]:  in thread 7f6892752700 
thread_name:mds_rank_progr
May 05 22:42:43 ceph06 bash[707251]:  ceph version 17.2.7 
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
May 05 22:42:43 ceph06 bash[707251]:  1: /lib64/libpthread.so.0(+0x12cf0) 
[0x7f689eb90cf0]
May 05 22:42:43 ceph06 bash[707251]:  2: gsignal()
May 05 22:42:43 ceph06 bash[707251]:  3: abort()
May 05 22:42:43 ceph06 bash[707251]:  4: (ceph::__ceph_assert_fail(char const*, 
char const*, int, char const*)+0x18f) [0x7f689fb974fd]
May 05 22:42:43 ceph06 bash[707251]:  5: 
/usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f689fb97669]
May 05 22:42:43 ceph06 bash[707251]:  6: 
(MDCache::rejoin_send_rejoins()+0x216b) [0x5605d03da7eb]
May 05 22:42:43 ceph06 bash[707251]:  7: 
(MDCache::process_imported_caps()+0x1993) [0x5605d03d8353]
May 05 22:42:43 ceph06 bash[707251]:  8: 
(MDCache::rejoin_open_ino_finish(inodeno_t, int)+0x217) [0x5605d03e5837]
May 05 22:42:43 ceph06 bash[707251]:  9: (MDSContext::complete(int)+0x5f) 
[0x5605d05a7f4f]
May 05 22:42:43 ceph06 bash[707251]:  10: (void 
finish_contexts<std::vector<MDSContext*, std::allocator<MDSContext*> > 
>(ceph::common::CephContext*, std::vector<MDSContext*, 
std::allocator<MDSContext*> >&, int)+0x8d) [0x5605d024cf5d]
May 05 22:42:43 ceph06 bash[707251]:  11: (MDCache::open_ino_finish(inodeno_t, 
MDCache::open_ino_info_t&, int)+0x138) [0x5605d03cd168]
May 05 22:42:43 ceph06 bash[707251]:  12: 
(MDCache::_open_ino_traverse_dir(inodeno_t, MDCache::open_ino_info_t&, 
int)+0xbb) [0x5605d03cd4bb]
May 05 22:42:43 ceph06 bash[707251]:  13: (MDSContext::complete(int)+0x5f) 
[0x5605d05a7f4f]
May 05 22:42:43 ceph06 bash[707251]:  14: (MDSRank::_advance_queues()+0xaa) 
[0x5605d025b34a]
May 05 22:42:43 ceph06 bash[707251]:  15: 
(MDSRank::ProgressThread::entry()+0xb8) [0x5605d025b918]
May 05 22:42:43 ceph06 bash[707251]:  16: /lib64/libpthread.so.0(+0x81ca) 
[0x7f689eb861ca]
May 05 22:42:43 ceph06 bash[707251]:  17: clone()
May 05 22:42:43 ceph06 bash[707251]:  NOTE: a copy of the executable, or 
`objdump -rdS <executable>` is needed to interpret this.
May 05 22:42:43 ceph06 bash[707251]: --- logging levels ---
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 none
May 05 22:42:43 ceph06 bash[707251]:    0/ 1 lockdep
May 05 22:42:43 ceph06 bash[707251]:    0/ 1 context
May 05 22:42:43 ceph06 bash[707251]:    1/ 1 crush
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mds
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mds_balancer
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mds_locker
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mds_log
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mds_log_expire
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mds_migrator
May 05 22:42:43 ceph06 bash[707251]:    0/ 1 buffer
May 05 22:42:43 ceph06 bash[707251]:    0/ 1 timer
May 05 22:42:43 ceph06 bash[707251]:    0/ 1 filer
May 05 22:42:43 ceph06 bash[707251]:    0/ 1 striper
May 05 22:42:43 ceph06 bash[707251]:    0/ 1 objecter
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 rados
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 rbd
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 rbd_mirror
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 rbd_replay
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 rbd_pwl
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 journaler
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 objectcacher
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 immutable_obj_cache
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 client
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 osd
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 optracker
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 objclass
May 05 22:42:43 ceph06 bash[707251]:    1/ 3 filestore
May 05 22:42:43 ceph06 bash[707251]:    1/ 3 journal
May 05 22:42:43 ceph06 bash[707251]:    0/ 0 ms
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mon
May 05 22:42:43 ceph06 bash[707251]:    0/10 monc
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 paxos
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 tp
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 auth
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 crypto
May 05 22:42:43 ceph06 bash[707251]:    1/ 1 finisher
May 05 22:42:43 ceph06 bash[707251]:    1/ 1 reserver
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 heartbeatmap
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 perfcounter
May 05 22:42:43 ceph06 bash[707251]:    1/ 2 rgw
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 rgw_sync
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 rgw_datacache
May 05 22:42:43 ceph06 bash[707251]:    1/10 civetweb
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 javaclient
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 asok
May 05 22:42:43 ceph06 bash[707251]:    1/ 1 throttle
May 05 22:42:43 ceph06 bash[707251]:    0/ 0 refs
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 compressor
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 bluestore
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 bluefs
May 05 22:42:43 ceph06 bash[707251]:    1/ 3 bdev
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 kstore
May 05 22:42:43 ceph06 bash[707251]:    4/ 5 rocksdb
May 05 22:42:43 ceph06 bash[707251]:    4/ 5 leveldb
May 05 22:42:43 ceph06 bash[707251]:    4/ 5 memdb
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 fuse
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mgr
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mgrc
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 dpdk
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 eventtrace
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 prioritycache
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 test
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 cephfs_mirror
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 cephsqlite
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_onode
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_odata
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_omap
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_tm
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_cleaner
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_lba
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_cache
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_journal
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 seastore_device
May 05 22:42:43 ceph06 bash[707251]:    0/ 5 alienstore
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 mclock
May 05 22:42:43 ceph06 bash[707251]:    1/ 5 ceph_exporter
May 05 22:42:43 ceph06 bash[707251]:   -2/-2 (syslog threshold)
May 05 22:42:43 ceph06 bash[707251]:   99/99 (stderr threshold)
May 05 22:42:43 ceph06 bash[707251]: --- pthread ID / name mapping for recent 
threads ---
May 05 22:42:43 ceph06 bash[707251]:   7f688f74c700 /
May 05 22:42:43 ceph06 bash[707251]:   7f689074e700 /
May 05 22:42:43 ceph06 bash[707251]:   7f6890f4f700 / MR_Finisher
May 05 22:42:43 ceph06 bash[707251]:   7f6891f51700 / PQ_Finisher
May 05 22:42:43 ceph06 bash[707251]:   7f6892752700 / mds_rank_progr
May 05 22:42:43 ceph06 bash[707251]:   7f6892f53700 / ms_dispatch
May 05 22:42:43 ceph06 bash[707251]:   7f6894f57700 / ceph-mds
May 05 22:42:43 ceph06 bash[707251]:   7f6895758700 / safe_timer
May 05 22:42:43 ceph06 bash[707251]:   7f6895f59700 / safe_timer
May 05 22:42:43 ceph06 bash[707251]:   7f6896f5b700 / ms_dispatch
May 05 22:42:43 ceph06 bash[707251]:   7f6897f5d700 / io_context_pool
May 05 22:42:43 ceph06 bash[707251]:   7f6898f5f700 / admin_socket
May 05 22:42:43 ceph06 bash[707251]:   7f6899760700 / msgr-worker-2
May 05 22:42:43 ceph06 bash[707251]:   7f6899f61700 / msgr-worker-1
May 05 22:42:43 ceph06 bash[707251]:   7f689a762700 / msgr-worker-0
May 05 22:42:43 ceph06 bash[707251]:   7f68a0cd4ac0 / ceph-mds
May 05 22:42:43 ceph06 bash[707251]:   max_recent     10000
May 05 22:42:43 ceph06 bash[707251]:   max_new        10000
May 05 22:42:43 ceph06 bash[707251]:   log_file 
/var/lib/ceph/crash/2024-05-05T20:42:43.014159Z_b6ad6bc1-0faa-4a78-8cb0-c004f051c7d6/log
May 05 22:42:43 ceph06 bash[707251]: --- end dump of recent events ---
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to