Hi,

We have upgraded a 5 node ceph cluster from Luminous to Nautilus and the
cluster was running fine. Yesterday when we tried to add one more osd into
the ceph cluster we find that the OSD is created in the cluster but
suddenly some of the other OSD's started to crash and we are not able to
restart any of the OSD's in that particular node where we found this issue.
Due to this we are not able to add the OSD's in other node and we are not
able to bring up the cluster.

The logs which are shown during the crash is below.


Nov 13 16:26:13 cn5 numactl: ceph version 14.2.2
(4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
Nov 13 16:26:13 cn5 numactl: 1: (()+0xf5d0) [0x7f488bb0f5d0]
Nov 13 16:26:13 cn5 numactl: 2: (gsignal()+0x37) [0x7f488a8ff207]
Nov 13 16:26:13 cn5 numactl: 3: (abort()+0x148) [0x7f488a9008f8]
Nov 13 16:26:13 cn5 numactl: 4: (ceph::__ceph_assert_fail(char const*, char
const*, int, char const*)+0x199) [0x5649f7348d43]
Nov 13 16:26:13 cn5 numactl: 5: (ceph::__ceph_assertf_fail(char const*,
char const*, int, char const*, char const*, ...)+0) [0x5649f7348ec2]
Nov 13 16:26:13 cn5 numactl: 6: (()+0x8e7e60) [0x5649f77c3e60]
Nov 13 16:26:13 cn5 numactl: 7:
(CallClientContexts::finish(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x6b9) [0x5649f77d5bf9]
Nov 13 16:26:13 cn5 numactl: 8:
(ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x8c)
[0x5649f77ab02c]
Nov 13 16:26:13 cn5 numactl: 9:
(ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&,
RecoveryMessages*, ZTracer::Trace const&)+0xd57) [0x5649f77c5627]
Nov 13 16:26:13 cn5 numactl: 10:
(ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x9f)
[0x5649f77c60af]
Nov 13 16:26:13 cn5 numactl: 11:
(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x87)
[0x5649f76a3467]
Nov 13 16:26:13 cn5 numactl: 12:
(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x695) [0x5649f764f365]
Nov 13 16:26:13 cn5 numactl: 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1a9)
[0x5649f7489ea9]
Nov 13 16:26:13 cn5 numactl: 14: (PGOpItem::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5649f77275d2]
Nov 13 16:26:13 cn5 numactl: 15: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x9f4) [0x5649f74a6ef4]
Nov 13 16:26:13 cn5 numactl: 16:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433)
[0x5649f7aa5ce3]
Nov 13 16:26:13 cn5 numactl: 17:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5649f7aa8d80]
Nov 13 16:26:13 cn5 numactl: 18: (()+0x7dd5) [0x7f488bb07dd5]
Nov 13 16:26:13 cn5 numactl: 19: (clone()+0x6d) [0x7f488a9c6ead]
Nov 13 16:26:13 cn5 numactl: NOTE: a copy of the executable, or `objdump
-rdS <executable>` is needed to interpret this.
Nov 13 16:26:13 cn5 systemd: ceph-osd@279.service: main process exited,
code=killed, status=6/ABRT


Could you please let us know what might be the issue and how to debug this?
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to