Hi,our mon is acting up all of a sudden and dying in crash loop with the following:
2019-10-04 14:00:24.339583 lease_expire=0.000000 has v0 lc 4549352 -3> 2019-10-04 14:00:24.335 7f6e5d461700 5 mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4548623..4549352) is_readable = 1 - now=2019-10-04 14:00:24.339620 lease_expire=0.000000 has v0 lc 4549352 -2> 2019-10-04 14:00:24.343 7f6e5d461700 -1 mon.km-fsn-1-dc4-m1-797678@0(leader).osd e257349 get_full_from_pinned_map closest pinned map ver 252615 not available! error: (2) No such file or directory -1> 2019-10-04 14:00:24.343 7f6e5d461700 -1 /build/ceph-14.2.4/src/mon/OSDMonitor.cc: In function 'int OSDMonitor::get_full_from_pinned_map(version_t, ceph::bufferlist&)' thread 7f6e5d461700 time 2019-10-04 14:00:24.347580 /build/ceph-14.2.4/src/mon/OSDMonitor.cc: 3932: FAILED ceph_assert(err == 0) ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6e68eb064e] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829] 3: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b] 4: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82] 5: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c] 6: (PaxosService::maybe_trim()+0x473) [0x707443] 7: (Monitor::tick()+0xa9) [0x5ecf39] 8: (C_MonContext::finish(int)+0x39) [0x5c3f29] 9: (Context::complete(int)+0x9) [0x6070d9] 10: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580] 11: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d] 12: (()+0x76ba) [0x7f6e67cab6ba] 13: (clone()+0x6d) [0x7f6e674d441d] 0> 2019-10-04 14:00:24.347 7f6e5d461700 -1 *** Caught signal (Aborted) ** in thread 7f6e5d461700 thread_name:safe_timer ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) 1: (()+0x11390) [0x7f6e67cb5390] 2: (gsignal()+0x38) [0x7f6e67402428] 3: (abort()+0x16a) [0x7f6e6740402a] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f6e68eb069f] 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f6e68eb0829] 6: (OSDMonitor::get_full_from_pinned_map(unsigned long, ceph::buffer::v14_2_0::list&)+0x80b) [0x72802b] 7: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v14_2_0::list&)+0x3d2) [0x728c82] 8: (OSDMonitor::encode_trim_extra(std::shared_ptr<MonitorDBStore::Transaction>, unsigned long)+0x8c) [0x717c3c] 9: (PaxosService::maybe_trim()+0x473) [0x707443] 10: (Monitor::tick()+0xa9) [0x5ecf39] 11: (C_MonContext::finish(int)+0x39) [0x5c3f29] 12: (Context::complete(int)+0x9) [0x6070d9] 13: (SafeTimer::timer_thread()+0x190) [0x7f6e68f45580] 14: (SafeTimerThread::entry()+0xd) [0x7f6e68f46e4d] 15: (()+0x76ba) [0x7f6e67cab6ba] 16: (clone()+0x6d) [0x7f6e674d441d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. This was running fine for 2months now, it's a crashed cluster that is in recovery. Any suggestions?
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com