At the risk of hijacking this thread, like I said I've ran into this
problem again, and have captured a log with debug_osd=20, viewable at
https://www.dropbox.com/s/8zoos5hhvakcpc4/ceph-osd.3.log?dl=0 - any
pointers?

On Tue, Jan 8, 2019 at 11:31 AM Peter Woodman <[email protected]> wrote:
>
> For the record, in the linked issue, it was thought that this might be
> due to write caching. This seems not to be the case, as it happened
> again to me with write caching disabled.
>
> On Tue, Jan 8, 2019 at 11:15 AM Sage Weil <[email protected]> wrote:
> >
> > I've seen this on luminous, but not on mimic.  Can you generate a log with
> > debug osd = 20 leading up to the crash?
> >
> > Thanks!
> > sage
> >
> >
> > On Tue, 8 Jan 2019, Paul Emmerich wrote:
> >
> > > I've seen this before a few times but unfortunately there doesn't seem
> > > to be a good solution at the moment :(
> > >
> > > See also: http://tracker.ceph.com/issues/23145
> > >
> > > Paul
> > >
> > > --
> > > Paul Emmerich
> > >
> > > Looking for help with your Ceph cluster? Contact us at https://croit.io
> > >
> > > croit GmbH
> > > Freseniusstr. 31h
> > > 81247 München
> > > www.croit.io
> > > Tel: +49 89 1896585 90
> > >
> > > On Tue, Jan 8, 2019 at 9:37 AM David Young <[email protected]> 
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > One of my OSD hosts recently ran into RAM contention (was swapping 
> > > > heavily), and after rebooting, I'm seeing this error on random OSDs in 
> > > > the cluster:
> > > >
> > > > ---
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  ceph version 13.2.4 
> > > > (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  1: /usr/bin/ceph-osd() 
> > > > [0xcac700]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  2: (()+0x11390) 
> > > > [0x7f8fa5d0e390]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  3: (gsignal()+0x38) 
> > > > [0x7f8fa5241428]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  4: (abort()+0x16a) 
> > > > [0x7f8fa524302a]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  5: 
> > > > (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> > > > const*)+0x250) [0x7f8fa767c510]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  6: (()+0x2e5587) 
> > > > [0x7f8fa767c587]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  7: 
> > > > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> > > > ObjectStore::Transaction*)+0x923) [0xbab5e3]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  8: 
> > > > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > > >  std::vector<ObjectStore::Transaction, 
> > > > std::allocator<ObjectStore::Transaction> >&, 
> > > > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5c3) 
> > > > [0xbade03]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  9: 
> > > > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > > >  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, 
> > > > ThreadPool::TPHandle*)+0x82) [0x79c812]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  10: 
> > > > (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, 
> > > > ThreadPool::TPHandle*)+0x58) [0x730ff8]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  11: 
> > > > (OSD::dequeue_peering_evt(OSDShard*, PG*, 
> > > > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xfe) [0x759aae]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  12: (PGPeeringItem::run(OSD*, 
> > > > OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) 
> > > > [0x9c5720]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  13: 
> > > > (OSD::ShardedOpWQ::_process(unsigned int, 
> > > > ceph::heartbeat_handle_d*)+0x590) [0x769760]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  14: 
> > > > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x476) 
> > > > [0x7f8fa76824f6]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  15: 
> > > > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f8fa76836b0]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  16: (()+0x76ba) 
> > > > [0x7f8fa5d046ba]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  17: (clone()+0x6d) 
> > > > [0x7f8fa531341d]
> > > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  NOTE: a copy of the 
> > > > executable, or `objdump -rdS <executable>` is needed to interpret this.
> > > > Jan 08 03:34:36 prod1 systemd[1]: [email protected]: Main process 
> > > > exited, code=killed, status=6/ABRT
> > > > ---
> > > >
> > > > I've restarted all the OSDs and the mons, but still encountering the 
> > > > above.
> > > >
> > > > Any ideas / suggestions?
> > > >
> > > > Thanks!
> > > > D
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > [email protected]
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > [email protected]
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to