I've seen this on luminous, but not on mimic. Can you generate a log with debug osd = 20 leading up to the crash?
Thanks! sage On Tue, 8 Jan 2019, Paul Emmerich wrote: > I've seen this before a few times but unfortunately there doesn't seem > to be a good solution at the moment :( > > See also: http://tracker.ceph.com/issues/23145 > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Tue, Jan 8, 2019 at 9:37 AM David Young <[email protected]> > wrote: > > > > Hi all, > > > > One of my OSD hosts recently ran into RAM contention (was swapping > > heavily), and after rebooting, I'm seeing this error on random OSDs in the > > cluster: > > > > --- > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: ceph version 13.2.4 > > (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 1: /usr/bin/ceph-osd() [0xcac700] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 2: (()+0x11390) [0x7f8fa5d0e390] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 3: (gsignal()+0x38) > > [0x7f8fa5241428] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 4: (abort()+0x16a) > > [0x7f8fa524302a] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 5: (ceph::__ceph_assert_fail(char > > const*, char const*, int, char const*)+0x250) [0x7f8fa767c510] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 6: (()+0x2e5587) [0x7f8fa767c587] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 7: > > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > > ObjectStore::Transaction*)+0x923) [0xbab5e3] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 8: > > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > > std::vector<ObjectStore::Transaction, > > std::allocator<ObjectStore::Transaction> >&, > > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5c3) [0xbade03] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 9: > > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, > > ThreadPool::TPHandle*)+0x82) [0x79c812] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 10: > > (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, > > ThreadPool::TPHandle*)+0x58) [0x730ff8] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 11: > > (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, > > ThreadPool::TPHandle&)+0xfe) [0x759aae] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 12: (PGPeeringItem::run(OSD*, > > OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) > > [0x9c5720] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 13: > > (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) > > [0x769760] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 14: > > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x476) > > [0x7f8fa76824f6] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 15: > > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f8fa76836b0] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 16: (()+0x76ba) [0x7f8fa5d046ba] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: 17: (clone()+0x6d) > > [0x7f8fa531341d] > > Jan 08 03:34:36 prod1 ceph-osd[3357939]: NOTE: a copy of the executable, > > or `objdump -rdS <executable>` is needed to interpret this. > > Jan 08 03:34:36 prod1 systemd[1]: [email protected]: Main process exited, > > code=killed, status=6/ABRT > > --- > > > > I've restarted all the OSDs and the mons, but still encountering the above. > > > > Any ideas / suggestions? > > > > Thanks! > > D > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
