For the record, in the linked issue, it was thought that this might be
due to write caching. This seems not to be the case, as it happened
again to me with write caching disabled.

On Tue, Jan 8, 2019 at 11:15 AM Sage Weil <[email protected]> wrote:
>
> I've seen this on luminous, but not on mimic.  Can you generate a log with
> debug osd = 20 leading up to the crash?
>
> Thanks!
> sage
>
>
> On Tue, 8 Jan 2019, Paul Emmerich wrote:
>
> > I've seen this before a few times but unfortunately there doesn't seem
> > to be a good solution at the moment :(
> >
> > See also: http://tracker.ceph.com/issues/23145
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io
> > Tel: +49 89 1896585 90
> >
> > On Tue, Jan 8, 2019 at 9:37 AM David Young <[email protected]> 
> > wrote:
> > >
> > > Hi all,
> > >
> > > One of my OSD hosts recently ran into RAM contention (was swapping 
> > > heavily), and after rebooting, I'm seeing this error on random OSDs in 
> > > the cluster:
> > >
> > > ---
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  ceph version 13.2.4 
> > > (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  1: /usr/bin/ceph-osd() 
> > > [0xcac700]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  2: (()+0x11390) [0x7f8fa5d0e390]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  3: (gsignal()+0x38) 
> > > [0x7f8fa5241428]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  4: (abort()+0x16a) 
> > > [0x7f8fa524302a]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  5: 
> > > (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> > > const*)+0x250) [0x7f8fa767c510]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  6: (()+0x2e5587) 
> > > [0x7f8fa767c587]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  7: 
> > > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> > > ObjectStore::Transaction*)+0x923) [0xbab5e3]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  8: 
> > > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > >  std::vector<ObjectStore::Transaction, 
> > > std::allocator<ObjectStore::Transaction> >&, 
> > > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x5c3) [0xbade03]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  9: 
> > > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > >  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, 
> > > ThreadPool::TPHandle*)+0x82) [0x79c812]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  10: 
> > > (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, 
> > > ThreadPool::TPHandle*)+0x58) [0x730ff8]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  11: 
> > > (OSD::dequeue_peering_evt(OSDShard*, PG*, 
> > > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xfe) [0x759aae]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  12: (PGPeeringItem::run(OSD*, 
> > > OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) 
> > > [0x9c5720]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  13: 
> > > (OSD::ShardedOpWQ::_process(unsigned int, 
> > > ceph::heartbeat_handle_d*)+0x590) [0x769760]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  14: 
> > > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x476) 
> > > [0x7f8fa76824f6]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  15: 
> > > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f8fa76836b0]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  16: (()+0x76ba) [0x7f8fa5d046ba]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  17: (clone()+0x6d) 
> > > [0x7f8fa531341d]
> > > Jan 08 03:34:36 prod1 ceph-osd[3357939]:  NOTE: a copy of the executable, 
> > > or `objdump -rdS <executable>` is needed to interpret this.
> > > Jan 08 03:34:36 prod1 systemd[1]: [email protected]: Main process 
> > > exited, code=killed, status=6/ABRT
> > > ---
> > >
> > > I've restarted all the OSDs and the mons, but still encountering the 
> > > above.
> > >
> > > Any ideas / suggestions?
> > >
> > > Thanks!
> > > D
> > > _______________________________________________
> > > ceph-users mailing list
> > > [email protected]
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to