Hi,

I upgraded to Nautilus a week or two ago and things had been mostly fine. I
was interested in trying the device health stats feature and enabled it. In
doing so it created a pool, device_health_metrics, which contained zero
bytes.

Unfortunately this pool developed a PG that could not be repaired with `ceph
pg repair`. That's okay, I thought, this pool is empty (zero bytes), so
I'll just remove it and discard the PG entirely.

So I did: `ceph osd pool rm device_health_metrics device_health_metrics
--yes-i-really-really-mean-it`

Within a few seconds three OSDs had gone missing (this pool was size=3) and
now crashloop at startup.

Any assistance in getting these OSDs up (such as by discarding the errant
PG) would be appreciated. I'm most concerned about the other pools in the
system, as losing three OSDs at once has not been ideal.

This is made more difficult as these are in the Bluestore configuration and
were set up with ceph-deploy to bare metal (using LVM mode).

Here's the traceback as noted in journalctl:

Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 11:01:08.902
7f8a00866d80 -1 Falling back to public interface
Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 11:01:41.835
7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true}
Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 11:01:43.203
7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction
error (39) Directory not empty not handled on operation 21 (op 1, counting
from 0)
Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 11:01:43.209
7f89dee53700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14
Apr 26 11:01:43 databox ceph-osd[1878533]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest
Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0
(3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (ceph::__ceph_abort(char
const*, int, char const*, std::string const&)+0xd8) [0xfc63afe40]
Apr 26 11:01:43 databox ceph-osd[1878533]: 2:
(BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
Apr 26 11:01:43 databox ceph-osd[1878533]: 3:
(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr
Apr 26 11:01:43 databox ceph-osd[1878533]: 4:
(ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>,
ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
Apr 26 11:01:43 databox ceph-osd[1878533]: 5:
(PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
Apr 26 11:01:43 databox ceph-osd[1878533]: 6:
(PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38)
[0xfc65cf528]
Apr 26 11:01:43 databox ceph-osd[1878533]: 7:
(boost::statechart::simple_state<PG::RecoveryState::Deleting,
PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
Apr 26 11:01:43 databox ceph-osd[1878533]: 8:
(boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
PG::RecoveryState::Initial, std::allocator<void>,
boost::statechart::null_exception_translator>::process_event(boost
Apr 26 11:01:43 databox ceph-osd[1878533]: 9:
(PG::do_peering_event(std::shared_ptr<PGPeeringEvent>,
PG::RecoveryCtx*)+0x119) [0xfc65dac99]
Apr 26 11:01:43 databox ceph-osd[1878533]: 10:
(OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>,
ThreadPool::TPHandle&)+0x1b4) [0xfc6515494]
Apr 26 11:01:43 databox ceph-osd[1878533]: 11:
(OSD::dequeue_delete(OSDShard*, PG*, unsigned int,
ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
Apr 26 11:01:43 databox ceph-osd[1878533]: 12:
(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4)
[0xfc6509c14]
Apr 26 11:01:43 databox ceph-osd[1878533]: 13:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433)
[0xfc6b01f43]
Apr 26 11:01:43 databox ceph-osd[1878533]: 14:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) [0x7f89fd4b0dd5]
Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d)
[0x7f89fc376ead]
Apr 26 11:01:43 databox ceph-osd[1878533]: 0> 2019-04-26 11:01:43.217
7f89dee53700 -1 *** Caught signal (Aborted) **
Apr 26 11:01:43 databox ceph-osd[1878533]: in thread 7f89dee53700
thread_name:tp_osd_tp
Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0
(3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (()+0xf5d0) [0x7f89fd4b85d0]
Apr 26 11:01:43 databox ceph-osd[1878533]: 2: (gsignal()+0x37)
[0x7f89fc2af207]
Apr 26 11:01:43 databox ceph-osd[1878533]: 3: (abort()+0x148)
[0x7f89fc2b08f8]
Apr 26 11:01:43 databox ceph-osd[1878533]: 4: (ceph::__ceph_abort(char
const*, int, char const*, std::string const&)+0x19c) [0xfc63aff04]
Apr 26 11:01:43 databox ceph-osd[1878533]: 5:
(BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
Apr 26 11:01:43 databox ceph-osd[1878533]: 6:
(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr
Apr 26 11:01:43 databox ceph-osd[1878533]: 7:
(ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>,
ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
Apr 26 11:01:43 databox ceph-osd[1878533]: 8:
(PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
Apr 26 11:01:43 databox ceph-osd[1878533]: 9:
(PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38)
[0xfc65cf528]
Apr 26 11:01:43 databox ceph-osd[1878533]: 10:
(boost::statechart::simple_state<PG::RecoveryState::Deleting,
PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n
Apr 26 11:01:43 databox ceph-osd[1878533]: 11:
(boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
PG::RecoveryState::Initial, std::allocator<void>,
boost::statechart::null_exception_translator>::process_event(boos
Apr 26 11:01:43 databox ceph-osd[1878533]: 12:
(PG::do_peering_event(std::shared_ptr<PGPeeringEvent>,
PG::RecoveryCtx*)+0x119) [0xfc65dac99]
Apr 26 11:01:43 databox ceph-osd[1878533]: 13:
(OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>,
ThreadPool::TPHandle&)+0x1b4) [0xfc6515494]
Apr 26 11:01:43 databox ceph-osd[1878533]: 14:
(OSD::dequeue_delete(OSDShard*, PG*, unsigned int,
ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
Apr 26 11:01:43 databox ceph-osd[1878533]: 15:
(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4)
[0xfc6509c14]
Apr 26 11:01:43 databox ceph-osd[1878533]: 16:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433)
[0xfc6b01f43]
Apr 26 11:01:43 databox ceph-osd[1878533]: 17:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
Apr 26 11:01:43 databox ceph-osd[1878533]: 18: (()+0x7dd5) [0x7f89fd4b0dd5]
Apr 26 11:01:43 databox ceph-osd[1878533]: 19: (clone()+0x6d)
[0x7f89fc376ead]
Apr 26 11:01:43 databox ceph-osd[1878533]: NOTE: a copy of the executable,
or `objdump -rdS <executable>` is needed to interpret this.

Thanks!

-Elise
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to