Hi all,
One of my OSD hosts recently ran into RAM contention (was swapping heavily),
and after rebooting, I'm seeing this error on random OSDs in the cluster:
---
Jan 08 03:34:36 prod1 ceph-osd[3357939]: ceph version 13.2.4
(b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 1: /usr/bin/ceph-osd() [0xcac700]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 2: (()+0x11390) [0x7f8fa5d0e390]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 3: (gsignal()+0x38) [0x7f8fa5241428]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 4: (abort()+0x16a) [0x7f8fa524302a]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 5: (ceph::__ceph_assert_fail(char
const*, char const*, int, char const*)+0x250) [0x7f8fa767c510]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 6: (()+0x2e5587) [0x7f8fa767c587]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 7:
(BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x923) [0xbab5e3]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 8:
(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5c3) [0xbade03]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 9:
(ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>,
ThreadPool::TPHandle*)+0x82) [0x79c812]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 10:
(OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
ThreadPool::TPHandle*)+0x58) [0x730ff8]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 11:
(OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>,
ThreadPool::TPHandle&)+0xfe) [0x759aae]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 12: (PGPeeringItem::run(OSD*,
OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x9c5720]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 13:
(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590)
[0x769760]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 14:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x476)
[0x7f8fa76824f6]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 15:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f8fa76836b0]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 16: (()+0x76ba) [0x7f8fa5d046ba]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: 17: (clone()+0x6d) [0x7f8fa531341d]
Jan 08 03:34:36 prod1 ceph-osd[3357939]: NOTE: a copy of the executable, or
`objdump -rdS <executable>` is needed to interpret this.
Jan 08 03:34:36 prod1 systemd[1]: [email protected]: Main process exited,
code=killed, status=6/ABRT
---
I've restarted all the OSDs and the mons, but still encountering the above.
Any ideas / suggestions?
Thanks!
D
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com