Hi,
MDSs are crashing on my production cluster when trying to unlink some files and
I need help :-).
When looking into the log files, I have identified some associated files and I
ran a scrub on the parent directory with force,repair,recursive options. No
error were detected but the problem persists.
'ceph -s" and "ceph health detail" display no error/warning and my main
question is: what are my next steps?
-3> 2022-02-11T08:36:20.647+0000 7fa372dba700 4 mds.0.server
handle_client_request client_request(client.3422129:6687 unlink
#0x10002191acc/gpt2_L-4_H-768_trained_pre-20_1_checkpoint_24_norm-2_norm-None_temporal-shifting-0_84_hidden-layer-0-1-2-3-4.o459077
2022-02-11T08:36:20.647472+0000 caller_uid=0,
caller_gid=0{0,1001,90590,90596,9060
2,90610,90619,90620,90627,90636,}) v4
-2> 2022-02-11T08:36:20.647+0000 7fa36bdac700 5 mds.0.log _submit_thread
9994621415698~1111 : EOpen [metablob 0x10002191acc, 1 dirs], 1 open files
-1> 2022-02-11T08:36:20.654+0000 7fa372dba700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/mds/Server.cc:
In function 'void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)'
thread 7fa372dba
700 time 2022-02-11T08:36:20.649556+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/mds/Server.cc:
7503: FAILED ceph_assert(in->first <= straydn->first)
ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158)
[0x7fa37b7decce]
2: /usr/lib64/ceph/libceph-common.so.2(+0x276ee8) [0x7fa37b7deee8]
3: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*,
CDentry*)+0x106a) [0x55e4bf43331a]
4: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x4d9)
[0x55e4bf437fe9]
5:
(Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xefb)
[0x55e4bf44e82b]
6: (MDCache::dispatch_request(boost::intrusive_ptr<MDRequestImpl>&)+0x33)
[0x55e4bf5044b3]
7: (MDSContext::complete(int)+0x56) [0x55e4bf6c0906]
8: (MDSCacheObject::finish_waiting(unsigned long, int)+0xce) [0x55e4bf6e26be]
9: (Locker::eval_gather(SimpleLock*, bool, bool*, std::vector<MDSContext*,
std::allocator<MDSContext*> >*)+0x13d6) [0x55e4bf594f66]
10: (Locker::handle_file_lock(ScatterLock*, boost::intrusive_ptr<MLock const>
const&)+0xed1) [0x55e4bf5a3241]
11: (Locker::handle_lock(boost::intrusive_ptr<MLock const> const&)+0x1b3)
[0x55e4bf5a3db3]
12: (Locker::dispatch(boost::intrusive_ptr<Message const> const&)+0xb4)
[0x55e4bf5a7fe4]
13: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&)+0xbcc)
[0x55e4bf3bf38c]
14: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&,
bool)+0x7bb) [0x55e4bf3c19eb]
15: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>
const&)+0x55) [0x55e4bf3c1fe5]
16: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x128)
[0x55e4bf3b1f28]
17: (DispatchQueue::entry()+0x126a) [0x7fa37ba1c4da]
18: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fa37bacce21]
19: /lib64/libpthread.so.0(+0x814a) [0x7fa37a7c514a]
20: clone()
Arnaud
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]