You'll probably want to generate a log with "debug osd = 20" and
"debug bluestore = 20", then share that or upload it with
ceph-post-file, to get more useful info about which PGs are breaking
(is it actually the ones that were supposed to delete?).

If there's a particular set of PGs you need to rescue, you can also
look at using the ceph-objectstore-tool to export them off the busted
OSD stores and import them into OSDs that still work.


On Fri, Apr 26, 2019 at 12:01 PM Elise Burke <[email protected]> wrote:
>
> Hi,
>
> I upgraded to Nautilus a week or two ago and things had been mostly fine. I 
> was interested in trying the device health stats feature and enabled it. In 
> doing so it created a pool, device_health_metrics, which contained zero bytes.
>
> Unfortunately this pool developed a PG that could not be repaired with `ceph 
> pg repair`. That's okay, I thought, this pool is empty (zero bytes), so I'll 
> just remove it and discard the PG entirely.
>
> So I did: `ceph osd pool rm device_health_metrics device_health_metrics 
> --yes-i-really-really-mean-it`
>
> Within a few seconds three OSDs had gone missing (this pool was size=3) and 
> now crashloop at startup.
>
> Any assistance in getting these OSDs up (such as by discarding the errant PG) 
> would be appreciated. I'm most concerned about the other pools in the system, 
> as losing three OSDs at once has not been ideal.
>
> This is made more difficult as these are in the Bluestore configuration and 
> were set up with ceph-deploy to bare metal (using LVM mode).
>
> Here's the traceback as noted in journalctl:
>
> Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 11:01:08.902 
> 7f8a00866d80 -1 Falling back to public interface
> Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 11:01:41.835 
> 7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true}
> Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 11:01:43.203 
> 7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction 
> error (39) Directory not empty not handled on operation 21 (op 1, counting 
> from 0)
> Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 11:01:43.209 
> 7f89dee53700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14
> Apr 26 11:01:43 databox ceph-osd[1878533]: 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest
> Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 
> (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
> Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (ceph::__ceph_abort(char 
> const*, int, char const*, std::string const&)+0xd8) [0xfc63afe40]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 2: 
> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 3: 
> (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  std::vector<ObjectStore::Transaction, 
> std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr
> Apr 26 11:01:43 databox ceph-osd[1878533]: 4: 
> (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, 
> ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
> Apr 26 11:01:43 databox ceph-osd[1878533]: 5: 
> (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 6: 
> (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) [0xfc65cf528]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 7: 
> (boost::statechart::simple_state<PG::RecoveryState::Deleting, 
> PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, 
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
> Apr 26 11:01:43 databox ceph-osd[1878533]: 8: 
> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, 
> PG::RecoveryState::Initial, std::allocator<void>, 
> boost::statechart::null_exception_translator>::process_event(boost
> Apr 26 11:01:43 databox ceph-osd[1878533]: 9: 
> (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, 
> PG::RecoveryCtx*)+0x119) [0xfc65dac99]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 10: 
> (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, 
> ThreadPool::TPHandle&)+0x1b4) [0xfc6515494]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 11: 
> (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
> ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 12: 
> (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) 
> [0xfc6509c14]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 13: 
> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) 
> [0xfc6b01f43]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 14: 
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) [0x7f89fd4b0dd5]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d) [0x7f89fc376ead]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 0> 2019-04-26 11:01:43.217 
> 7f89dee53700 -1 *** Caught signal (Aborted) **
> Apr 26 11:01:43 databox ceph-osd[1878533]: in thread 7f89dee53700 
> thread_name:tp_osd_tp
> Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 
> (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
> Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (()+0xf5d0) [0x7f89fd4b85d0]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 2: (gsignal()+0x37) 
> [0x7f89fc2af207]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 3: (abort()+0x148) [0x7f89fc2b08f8]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 4: (ceph::__ceph_abort(char 
> const*, int, char const*, std::string const&)+0x19c) [0xfc63aff04]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 5: 
> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 6: 
> (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  std::vector<ObjectStore::Transaction, 
> std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr
> Apr 26 11:01:43 databox ceph-osd[1878533]: 7: 
> (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, 
> ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
> Apr 26 11:01:43 databox ceph-osd[1878533]: 8: 
> (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 9: 
> (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) [0xfc65cf528]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 10: 
> (boost::statechart::simple_state<PG::RecoveryState::Deleting, 
> PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, 
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n
> Apr 26 11:01:43 databox ceph-osd[1878533]: 11: 
> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, 
> PG::RecoveryState::Initial, std::allocator<void>, 
> boost::statechart::null_exception_translator>::process_event(boos
> Apr 26 11:01:43 databox ceph-osd[1878533]: 12: 
> (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, 
> PG::RecoveryCtx*)+0x119) [0xfc65dac99]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 13: 
> (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, 
> ThreadPool::TPHandle&)+0x1b4) [0xfc6515494]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 14: 
> (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
> ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 15: 
> (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) 
> [0xfc6509c14]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 16: 
> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) 
> [0xfc6b01f43]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 17: 
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 18: (()+0x7dd5) [0x7f89fd4b0dd5]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 19: (clone()+0x6d) [0x7f89fc376ead]
> Apr 26 11:01:43 databox ceph-osd[1878533]: NOTE: a copy of the executable, or 
> `objdump -rdS <executable>` is needed to interpret this.
>
> Thanks!
>
> -Elise
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to