You'll probably want to generate a log with "debug osd = 20" and "debug bluestore = 20", then share that or upload it with ceph-post-file, to get more useful info about which PGs are breaking (is it actually the ones that were supposed to delete?).
If there's a particular set of PGs you need to rescue, you can also look at using the ceph-objectstore-tool to export them off the busted OSD stores and import them into OSDs that still work. On Fri, Apr 26, 2019 at 12:01 PM Elise Burke <[email protected]> wrote: > > Hi, > > I upgraded to Nautilus a week or two ago and things had been mostly fine. I > was interested in trying the device health stats feature and enabled it. In > doing so it created a pool, device_health_metrics, which contained zero bytes. > > Unfortunately this pool developed a PG that could not be repaired with `ceph > pg repair`. That's okay, I thought, this pool is empty (zero bytes), so I'll > just remove it and discard the PG entirely. > > So I did: `ceph osd pool rm device_health_metrics device_health_metrics > --yes-i-really-really-mean-it` > > Within a few seconds three OSDs had gone missing (this pool was size=3) and > now crashloop at startup. > > Any assistance in getting these OSDs up (such as by discarding the errant PG) > would be appreciated. I'm most concerned about the other pools in the system, > as losing three OSDs at once has not been ideal. > > This is made more difficult as these are in the Bluestore configuration and > were set up with ceph-deploy to bare metal (using LVM mode). > > Here's the traceback as noted in journalctl: > > Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 11:01:08.902 > 7f8a00866d80 -1 Falling back to public interface > Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 11:01:41.835 > 7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true} > Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 11:01:43.203 > 7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction > error (39) Directory not empty not handled on operation 21 (op 1, counting > from 0) > Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 11:01:43.209 > 7f89dee53700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14 > Apr 26 11:01:43 databox ceph-osd[1878533]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest > Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 > (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) > Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (ceph::__ceph_abort(char > const*, int, char const*, std::string const&)+0xd8) [0xfc63afe40] > Apr 26 11:01:43 databox ceph-osd[1878533]: 2: > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5] > Apr 26 11:01:43 databox ceph-osd[1878533]: 3: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr > Apr 26 11:01:43 databox ceph-osd[1878533]: 4: > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x7f) [0xfc656b81f > Apr 26 11:01:43 databox ceph-osd[1878533]: 5: > (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d] > Apr 26 11:01:43 databox ceph-osd[1878533]: 6: > (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) [0xfc65cf528] > Apr 26 11:01:43 databox ceph-osd[1878533]: 7: > (boost::statechart::simple_state<PG::RecoveryState::Deleting, > PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na > Apr 26 11:01:43 databox ceph-osd[1878533]: 8: > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, > PG::RecoveryState::Initial, std::allocator<void>, > boost::statechart::null_exception_translator>::process_event(boost > Apr 26 11:01:43 databox ceph-osd[1878533]: 9: > (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, > PG::RecoveryCtx*)+0x119) [0xfc65dac99] > Apr 26 11:01:43 databox ceph-osd[1878533]: 10: > (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, > ThreadPool::TPHandle&)+0x1b4) [0xfc6515494] > Apr 26 11:01:43 databox ceph-osd[1878533]: 11: > (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, > ThreadPool::TPHandle&)+0x234) [0xfc65158d4] > Apr 26 11:01:43 databox ceph-osd[1878533]: 12: > (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) > [0xfc6509c14] > Apr 26 11:01:43 databox ceph-osd[1878533]: 13: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) > [0xfc6b01f43] > Apr 26 11:01:43 databox ceph-osd[1878533]: 14: > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0] > Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) [0x7f89fd4b0dd5] > Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d) [0x7f89fc376ead] > Apr 26 11:01:43 databox ceph-osd[1878533]: 0> 2019-04-26 11:01:43.217 > 7f89dee53700 -1 *** Caught signal (Aborted) ** > Apr 26 11:01:43 databox ceph-osd[1878533]: in thread 7f89dee53700 > thread_name:tp_osd_tp > Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 > (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) > Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (()+0xf5d0) [0x7f89fd4b85d0] > Apr 26 11:01:43 databox ceph-osd[1878533]: 2: (gsignal()+0x37) > [0x7f89fc2af207] > Apr 26 11:01:43 databox ceph-osd[1878533]: 3: (abort()+0x148) [0x7f89fc2b08f8] > Apr 26 11:01:43 databox ceph-osd[1878533]: 4: (ceph::__ceph_abort(char > const*, int, char const*, std::string const&)+0x19c) [0xfc63aff04] > Apr 26 11:01:43 databox ceph-osd[1878533]: 5: > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5] > Apr 26 11:01:43 databox ceph-osd[1878533]: 6: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr > Apr 26 11:01:43 databox ceph-osd[1878533]: 7: > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x7f) [0xfc656b81f > Apr 26 11:01:43 databox ceph-osd[1878533]: 8: > (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d] > Apr 26 11:01:43 databox ceph-osd[1878533]: 9: > (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) [0xfc65cf528] > Apr 26 11:01:43 databox ceph-osd[1878533]: 10: > (boost::statechart::simple_state<PG::RecoveryState::Deleting, > PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n > Apr 26 11:01:43 databox ceph-osd[1878533]: 11: > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, > PG::RecoveryState::Initial, std::allocator<void>, > boost::statechart::null_exception_translator>::process_event(boos > Apr 26 11:01:43 databox ceph-osd[1878533]: 12: > (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, > PG::RecoveryCtx*)+0x119) [0xfc65dac99] > Apr 26 11:01:43 databox ceph-osd[1878533]: 13: > (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, > ThreadPool::TPHandle&)+0x1b4) [0xfc6515494] > Apr 26 11:01:43 databox ceph-osd[1878533]: 14: > (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, > ThreadPool::TPHandle&)+0x234) [0xfc65158d4] > Apr 26 11:01:43 databox ceph-osd[1878533]: 15: > (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) > [0xfc6509c14] > Apr 26 11:01:43 databox ceph-osd[1878533]: 16: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) > [0xfc6b01f43] > Apr 26 11:01:43 databox ceph-osd[1878533]: 17: > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0] > Apr 26 11:01:43 databox ceph-osd[1878533]: 18: (()+0x7dd5) [0x7f89fd4b0dd5] > Apr 26 11:01:43 databox ceph-osd[1878533]: 19: (clone()+0x6d) [0x7f89fc376ead] > Apr 26 11:01:43 databox ceph-osd[1878533]: NOTE: a copy of the executable, or > `objdump -rdS <executable>` is needed to interpret this. > > Thanks! > > -Elise > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
