Thanks guys!

Sage found the issue preventing PG removal from working right and that
is going through testing now and should make the next Nautilus
release. https://github.com/ceph/ceph/pull/27929

Apparently the device health metrics is doing something slightly
naughty, so hopefully that's easy to turn up as well (if not, it won't
be able to break them in the future!).
-Greg

On Fri, May 3, 2019 at 2:05 PM Dietmar Rieder
<dietmar.rie...@i-med.ac.at> wrote:
>
> Hi,
>
> to answer my question and for the record:
>
> It turned out that the "device_health_metrics" pool was using PG 7.0
> which had no objects after removing the pool but the PG was somehow not
> deleted/removed.
>
> [root@cephosd-05 ~]# ceph-objectstore-tool --data-path
> /var/lib/ceph/osd/ceph-98 --pgid 7.0 --op list
> Did not show any objects.
>
> Issuing the command ( thanks Elise for sharing your experience ) on all
> three osds, successfully removed the PG.
>
> [root@cephosd-05 ~]# ceph-objectstore-tool --data-path
> /var/lib/ceph/osd/ceph-98 --op export-remove --pgid 7.0 --file
> pg_7_0_from_osd_98.bin
> Exporting 7.0 info 7.0( v 9010'82 (0'0,9010'82] lb MIN (bitwise)
> local-lis/les=9008/9009 n=41 ec=9008/9008 lis/c 9008/9008 les/c/f
> 9009/9009/0 9019/9019/9019)
> Export successful
>  marking collection for removal
> setting '_remove' omap key
> finish_remove_pgs 7.0_head removing 7.0
> Remove successful
>
> Now I could start the three OSDs again and the cluster is HEALTHY.
>
> I hope this gets fixed soon, meanwhile one should keep this in mind and
> be careful when trying the ceph device monitoring and deleting the
> device_health_metrics pool.
>
> Best
>   Dietmar
>
> On 5/3/19 10:09 PM, Dietmar Rieder wrote:
> > HI,
> >
> > I think I just hit the sam problem on Nautilus 14.2.1
> > I tested the  ceph device monitoring, which created a new pool
> > (device_health_metrics), after looking into the monitoring feature, I
> > turned it off again and removed the pool. This resulted int 3 OSDs down
> > which can not be started again since they keep crashing.
> >
> > How can I reenable the OSDs?
> >
> > I think the following is the relevant log.
> >
> > 2019-05-03 21:24:05.265 7f8e96b8a700 -1
> > bluestore(/var/lib/ceph/osd/ceph-206) _txc_add_transaction error (39)
> > Directory not empty not handled on operation 21 (op 1, counting from 0)
> > 2019-05-03 21:24:05.265 7f8e96b8a700  0
> > bluestore(/var/lib/ceph/osd/ceph-206) _dump_transaction transaction dump:
> > {
> >     "ops": [
> >         {
> >             "op_num": 0,
> >             "op_name": "remove",
> >             "collection": "7.0_head",
> >             "oid": "#7:00000000::::head#"
> >         },
> >         {
> >             "op_num": 1,
> >             "op_name": "rmcoll",
> >             "collection": "7.0_head"
> >         }
> >     ]
> > }
> >
> > 2019-05-03 21:24:05.269 7f8e96b8a700 -1
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2
> > .1/rpm/el7/BUILD/ceph-14.2.1/src/os/bluestore/BlueStore.cc: In function
> > 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> > ObjectStore::Transaction*)' thread 7f8e96b8a700 tim
> > e 2019-05-03 21:24:05.266723
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/os/blue
> > store/BlueStore.cc: 11089: abort()
> >
> >  ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
> > (stable)
> >  1: (ceph::__ceph_abort(char const*, int, char const*, std::string
> > const&)+0xd8) [0x55766fbd6cd0]
> >  2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> > ObjectStore::Transaction*)+0x2a85) [0x5576701b6af5]
> >  3:
> > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > std::vector<ObjectStore::Transaction,
> > std::allocator<ObjectStore::Transaction> >&, boost::intrusive_
> > ptr<TrackedOp>, ThreadPool::TPHandle*)+0x526) [0x5576701b7866]
> >  4:
> > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>,
> > ThreadPool::TPHandle*)+0x7f) [0x55766f
> > d9274f]
> >  5: (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0x55766fdf577d]
> >  6: (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38)
> > [0x55766fdf6598]
> >  7: (boost::statechart::simple_state<PG::RecoveryState::Deleting,
> > PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_:
> > :na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> > (boost::statechart::history_mode)0>::react_impl(boost::statec
> > hart::event_base const&, void const*)+0x16a) [0x55766fe355ca]
> >  8:
> > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> > PG::RecoveryState::Initial, std::allocator<void>,
> > boost::statechart::null_exception_translator>::process_event(bo
> > ost::statechart::event_base const&)+0x5a) [0x55766fe130ca]
> >  9: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>,
> > PG::RecoveryCtx*)+0x119) [0x55766fe02389]
> >  10: (OSD::dequeue_peering_evt(OSDShard*, PG*,
> > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4)
> > [0x55766fd3c3c4]
> >  11: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int,
> > ThreadPool::TPHandle&)+0x234) [0x55766fd3c804]
> >  12: (OSD::ShardedOpWQ::_process(unsigned int,
> > ceph::heartbeat_handle_d*)+0x9f4) [0x55766fd30b44]
> >  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433)
> > [0x55767032ae93]
> >  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55767032df30]
> >  15: (()+0x7dd5) [0x7f8eb7162dd5]
> >  16: (clone()+0x6d) [0x7f8eb6028ead]
> >
> > 2019-05-03 21:24:05.274 7f8e96b8a700 -1 *** Caught signal (Aborted) **
> >  in thread 7f8e96b8a700 thread_name:tp_osd_tp
> >
> >  ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus
> > (stable)
> >  1: (()+0xf5d0) [0x7f8eb716a5d0]
> >  2: (gsignal()+0x37) [0x7f8eb5f61207]
> >  3: (abort()+0x148) [0x7f8eb5f628f8]
> >  4: (ceph::__ceph_abort(char const*, int, char const*, std::string
> > const&)+0x19c) [0x55766fbd6d94]
> >  5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> > ObjectStore::Transaction*)+0x2a85) [0x5576701b6af5]
> >  6:
> > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > std::vector<ObjectStore::Transaction,
> > std::allocator<ObjectStore::Transaction> >&, boost::intrusive_
> > ptr<TrackedOp>, ThreadPool::TPHandle*)+0x526) [0x5576701b7866]
> >  7:
> > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>,
> > ThreadPool::TPHandle*)+0x7f) [0x55766f
> > d9274f]
> >  8: (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0x55766fdf577d]
> >  9: (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38)
> > [0x55766fdf6598]
> >  10: (boost::statechart::simple_state<PG::RecoveryState::Deleting,
> > PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_
> > ::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> > (boost::statechart::history_mode)0>::react_impl(boost::state
> > chart::event_base const&, void const*)+0x16a) [0x55766fe355ca]
> >  11:
> > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> > PG::RecoveryState::Initial, std::allocator<void>,
> > boost::statechart::null_exception_translator>::process_event(b
> > oost::statechart::event_base const&)+0x5a) [0x55766fe130ca]
> >  12: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>,
> > PG::RecoveryCtx*)+0x119) [0x55766fe02389]
> >  13: (OSD::dequeue_peering_evt(OSDShard*, PG*,
> > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4)
> > [0x55766fd3c3c4]
> >  14: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int,
> > ThreadPool::TPHandle&)+0x234) [0x55766fd3c804]
> >  15: (OSD::ShardedOpWQ::_process(unsigned int,
> > ceph::heartbeat_handle_d*)+0x9f4) [0x55766fd30b44]
> >  16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433)
> > [0x55767032ae93]
> >  17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55767032df30]
> >  18: (()+0x7dd5) [0x7f8eb7162dd5]
> >  19: (clone()+0x6d) [0x7f8eb6028ead]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> >
> >
> > ~Best
> >    Dietmar
> >
> > On 4/29/19 11:05 PM, Gregory Farnum wrote:
> >> Glad you got it working and thanks for the logs! Looks like we've seen
> >> this once or twice before so I added them to
> >> https://tracker.ceph.com/issues/38724.
> >> -Greg
> >>
> >> On Fri, Apr 26, 2019 at 5:52 PM Elise Burke <elise.n...@gmail.com> wrote:
> >>>
> >>> Thanks for the pointer to ceph-objectstore-tool, it turns out that 
> >>> removing and exporting the PG from all three disks was enough to make it 
> >>> boot! I've exported the three copies of the bad PG, let me know if you'd 
> >>> like me to upload them anywhere for inspection.
> >>>
> >>> All data has been recovered (since I was originally removing the pool 
> >>> that contained pg 25.0 anyway) and all systems are go on my end. Ceph's 
> >>> architecture is very solid.
> >>>
> >>>
> >>> $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6 --op 
> >>> export-remove --pgid 25.0 --file pg_25_0_from_osd_6.bin
> >>> Exporting 25.0 info 25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) 
> >>> local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 
> >>> 7489/7489/0 7593/7593/7593)
> >>> Export successful
> >>>  marking collection for removal
> >>> setting '_remove' omap key
> >>> finish_remove_pgs 25.0_head removing 25.0
> >>> Remove successful
> >>>
> >>>
> >>> On Fri, Apr 26, 2019 at 8:33 PM Elise Burke <elise.n...@gmail.com> wrote:
> >>>>
> >>>> Using ceph-objectstore-info on PG 25.0 (which indeed, was the one I 
> >>>> remember having the error) shows this:
> >>>>
> >>>> struct_v 10
> >>>> {
> >>>>     "pgid": "25.0",
> >>>>     "last_update": "7592'106",
> >>>>     "last_complete": "7592'106",
> >>>>     "log_tail": "0'0",
> >>>>     "last_user_version": 106,
> >>>>     "last_backfill": "MIN",
> >>>>     "last_backfill_bitwise": 1,
> >>>>     "purged_snaps": [],
> >>>>     "history": {
> >>>>         "epoch_created": 5191,
> >>>>         "epoch_pool_created": 5191,
> >>>>         "last_epoch_started": 7489,
> >>>>         "last_interval_started": 7488,
> >>>>         "last_epoch_clean": 7489,
> >>>>         "last_interval_clean": 7488,
> >>>>         "last_epoch_split": 0,
> >>>>         "last_epoch_marked_full": 0,
> >>>>         "same_up_since": 7593,
> >>>>         "same_interval_since": 7593,
> >>>>         "same_primary_since": 7593,
> >>>>         "last_scrub": "7592'106",
> >>>>         "last_scrub_stamp": "2019-04-25 21:34:52.079721",
> >>>>         "last_deep_scrub": "7485'70",
> >>>>         "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014",
> >>>>         "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548"
> >>>>     },
> >>>>     "stats": {
> >>>>         "version": "7592'105",
> >>>>         "reported_seq": "2621",
> >>>>         "reported_epoch": "7592",
> >>>>         "state": "active+clean+inconsistent",
> >>>>         "last_fresh": "2019-04-25 20:02:55.620028",
> >>>>         "last_change": "2019-04-24 19:52:45.072473",
> >>>>         "last_active": "2019-04-25 20:02:55.620028",
> >>>>         "last_peered": "2019-04-25 20:02:55.620028",
> >>>>         "last_clean": "2019-04-25 20:02:55.620028",
> >>>>         "last_became_active": "2019-04-22 17:55:37.578239",
> >>>>         "last_became_peered": "2019-04-22 17:55:37.578239",
> >>>>         "last_unstale": "2019-04-25 20:02:55.620028",
> >>>>         "last_undegraded": "2019-04-25 20:02:55.620028",
> >>>>         "last_fullsized": "2019-04-25 20:02:55.620028",
> >>>>         "mapping_epoch": 7593,
> >>>>         "log_start": "0'0",
> >>>>         "ondisk_log_start": "0'0",
> >>>>         "created": 5191,
> >>>>         "last_epoch_clean": 7489,
> >>>>         "parent": "0.0",
> >>>>         "parent_split_bits": 0,
> >>>>         "last_scrub": "7592'88",
> >>>>         "last_scrub_stamp": "2019-04-24 19:52:45.072367",
> >>>>         "last_deep_scrub": "7485'70",
> >>>>         "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014",
> >>>>         "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548",
> >>>>         "log_size": 105,
> >>>>         "ondisk_log_size": 105,
> >>>>         "stats_invalid": false,
> >>>>         "dirty_stats_invalid": false,
> >>>>         "omap_stats_invalid": false,
> >>>>         "hitset_stats_invalid": false,
> >>>>         "hitset_bytes_stats_invalid": false,
> >>>>         "pin_stats_invalid": false,
> >>>>         "manifest_stats_invalid": false,
> >>>>         "snaptrimq_len": 0,
> >>>>         "stat_sum": {
> >>>>             "num_bytes": 0,
> >>>>             "num_objects": 9,
> >>>>             "num_object_clones": 0,
> >>>>             "num_object_copies": 27,
> >>>>             "num_objects_missing_on_primary": 0,
> >>>>             "num_objects_missing": 0,
> >>>>             "num_objects_degraded": 0,
> >>>>             "num_objects_misplaced": 0,
> >>>>             "num_objects_unfound": 0,
> >>>>             "num_objects_dirty": 9,
> >>>>             "num_whiteouts": 0,
> >>>>             "num_read": 87,
> >>>>             "num_read_kb": 87,
> >>>>             "num_write": 98,
> >>>>             "num_write_kb": 98,
> >>>>             "num_scrub_errors": 0,
> >>>>             "num_shallow_scrub_errors": 0,
> >>>>             "num_deep_scrub_errors": 0,
> >>>>             "num_objects_recovered": 0,
> >>>>             "num_bytes_recovered": 0,
> >>>>             "num_keys_recovered": 0,
> >>>>             "num_objects_omap": 9,
> >>>>             "num_objects_hit_set_archive": 0,
> >>>>             "num_bytes_hit_set_archive": 0,
> >>>>             "num_flush": 0,
> >>>>             "num_flush_kb": 0,
> >>>>             "num_evict": 0,
> >>>>             "num_evict_kb": 0,
> >>>>             "num_promote": 0,
> >>>>             "num_flush_mode_high": 0,
> >>>>             "num_flush_mode_low": 0,
> >>>>             "num_evict_mode_some": 0,
> >>>>             "num_evict_mode_full": 0,
> >>>>             "num_objects_pinned": 0,
> >>>>             "num_legacy_snapsets": 0,
> >>>>             "num_large_omap_objects": 0,
> >>>>             "num_objects_manifest": 0,
> >>>>             "num_omap_bytes": 0,
> >>>>             "num_omap_keys": 0
> >>>>         },
> >>>>         "up": [],
> >>>>         "acting": [],
> >>>>         "blocked_by": [],
> >>>>         "up_primary": -1,
> >>>>         "acting_primary": -1,
> >>>>         "purged_snaps": []
> >>>>     },
> >>>>     "empty": 0,
> >>>>     "dne": 0,
> >>>>     "incomplete": 1,
> >>>>     "last_epoch_started": 7489,
> >>>>     "hit_set_history": {
> >>>>         "current_last_update": "0'0",
> >>>>         "history": []
> >>>>     }
> >>>> }
> >>>>
> >>>>
> >>>> On Fri, Apr 26, 2019 at 7:35 PM Elise Burke <elise.n...@gmail.com> wrote:
> >>>>>
> >>>>> Thanks for the suggestions. I've uploaded the surprisingly large 
> >>>>> (1.5G!) log file: ceph-post-file: 2d8d22f4-580b-4b57-a13a-f49dade34ba7
> >>>>>
> >>>>> Looks like these are the relevant lines:
> >>>>>
> >>>>>    -52> 2019-04-26 19:23:05.190 7fb2657dc700 20 osd.2 op_wq(2) _process 
> >>>>> empty q, waiting
> >>>>>    -51> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process 
> >>>>> OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) queued
> >>>>>    -50> 2019-04-26 19:23:05.190 7fb2717f4700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread sleep
> >>>>>    -49> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process 
> >>>>> 25.0 to_process <OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 
> >>>>> 1048576 e7648)> waiting <> waiting_peering {}
> >>>>>    -48> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process 
> >>>>> OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) pg 
> >>>>> 0x7c20a00000
> >>>>>    -47> 2019-04-26 19:23:05.190 7fb2667de700 10 osd.2 pg_epoch: 7648 
> >>>>> pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) 
> >>>>> local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 
> >>>>> 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D
> >>>>> ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] 
> >>>>> do_peering_event: epoch_sent: 7648 epoch_requested: 7648 DeleteSome
> >>>>>    -46> 2019-04-26 19:23:05.190 7fb2667de700 10 osd.2 pg_epoch: 7648 
> >>>>> pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) 
> >>>>> local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 
> >>>>> 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D
> >>>>> ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] 
> >>>>> _delete_some
> >>>>>    -45> 2019-04-26 19:23:05.190 7fb2667de700 15 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) collection_list 25.0_head start 
> >>>>> GHMIN end GHMAX max 30
> >>>>>    -44> 2019-04-26 19:23:05.190 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list range 
> >>>>> 0x7f7fffffffffffffe500000000 to 0x7f7fffffffffffffe5ffffffff and 
> >>>>> 0x7f800000000000001900000000 to 0x7f8000000000000019ffffffff
> >>>>>  start GHMIN
> >>>>>    -43> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list pend 
> >>>>> 0x7f7fffffffffffffe5ffffffff
> >>>>>    -42> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key 
> >>>>> 0x7f7fffffffffffffff100020'P!osdmap.7114!='0x0000000000000000ffffffffffffffff'o'
> >>>>>  >= GHMAX
> >>>>>    -41> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid 
> >>>>> #25:00000000::::head# end GHMAX
> >>>>>    -40> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid 
> >>>>> #25:b08b92bd::::head# end GHMAX
> >>>>>    -39> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key 
> >>>>> 0x80800000000000000c1c00000021213dfffffffffffffffeffffffffffffffff'o' 
> >>>>> >= GHMAX
> >>>>>    -38> 2019-04-26 19:23:05.191 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) collection_list 25.0_head start 
> >>>>> GHMAX end GHMAX max 30 = 0, ls.size() = 2, next = GHMAX
> >>>>>    -37> 2019-04-26 19:23:05.191 7fb2667de700 20 osd.2 pg_epoch: 7648 
> >>>>> pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) 
> >>>>> local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 
> >>>>> 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D
> >>>>> ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] 
> >>>>> _delete_some [#25:00000000::::head#,#25:b08b92bd::::head#]
> >>>>>    -36> 2019-04-26 19:23:05.191 7fb2667de700 20 osd.2 pg_epoch: 7648 
> >>>>> pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) 
> >>>>> local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 
> >>>>> 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D
> >>>>> ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] 
> >>>>> _delete_some finished
> >>>>>    -35> 2019-04-26 19:23:05.191 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) queue_transactions ch 0x7c27d94b40 
> >>>>> 25.0_head
> >>>>>    -34> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _txc_create osr 0x7c274f6c60 = 
> >>>>> 0x7c2ef34c00 seq 3
> >>>>>    -33> 2019-04-26 19:23:05.191 7fb2667de700 15 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove 25.0_head 
> >>>>> #25:00000000::::head# onode 0x7c20aaf200 txc 0x7c2ef34c00
> >>>>>    -32> 2019-04-26 19:23:05.191 7fb2667de700 15 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _do_truncate 25.0_head 
> >>>>> #25:00000000::::head# 0x0
> >>>>>    -31> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore.onode(0x7c20aaf200).flush flush done
> >>>>>    -30> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _do_omap_clear remove range start: 
> >>>>> 0x0000000000b0295a'-' end: 0x0000000000b0295a'~'
> >>>>>    -29> 2019-04-26 19:23:05.191 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove 25.0_head 
> >>>>> #25:00000000::::head# = 0
> >>>>>    -28> 2019-04-26 19:23:05.191 7fb2667de700 15 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection 25.0_head
> >>>>>    -27> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore.OnodeSpace(0x7c27d94c98 in 0x7c1f7e10a0) map_any
> >>>>>    -26> 2019-04-26 19:23:05.191 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list range 
> >>>>> 0x7f7fffffffffffffe500000000 to 0x7f7fffffffffffffe5ffffffff and 
> >>>>> 0x7f800000000000001900000000 to 0x7f8000000000000019ffffffff
> >>>>>  start GHMIN
> >>>>>    -25> 2019-04-26 19:23:05.192 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list pend 
> >>>>> 0x7f7fffffffffffffe5ffffffff
> >>>>>    -24> 2019-04-26 19:23:05.192 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key 
> >>>>> 0x7f7fffffffffffffff100020'P!osdmap.7114!='0x0000000000000000ffffffffffffffff'o'
> >>>>>  >= GHMAX
> >>>>>    -23> 2019-04-26 19:23:05.192 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid 
> >>>>> #25:00000000::::head# end GHMAX
> >>>>>    -22> 2019-04-26 19:23:05.192 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid 
> >>>>> #25:b08b92bd::::head# end GHMAX
> >>>>>    -21> 2019-04-26 19:23:05.192 7fb2667de700 20 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key 
> >>>>> 0x80800000000000000c1c00000021213dfffffffffffffffeffffffffffffffff'o' 
> >>>>> >= GHMAX
> >>>>>    -20> 2019-04-26 19:23:05.192 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection oid 
> >>>>> #25:00000000::::head#
> >>>>>    -19> 2019-04-26 19:23:05.192 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection oid 
> >>>>> #25:b08b92bd::::head#
> >>>>>    -18> 2019-04-26 19:23:05.192 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection 
> >>>>> #25:b08b92bd::::head# exists in db, not present in ram
> >>>>>    -17> 2019-04-26 19:23:05.192 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection 25.0_head is 
> >>>>> non-empty
> >>>>>    -16> 2019-04-26 19:23:05.192 7fb2667de700 10 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection 25.0_head = -39
> >>>>>    -15> 2019-04-26 19:23:05.192 7fb2667de700 -1 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction error (39) 
> >>>>> Directory not empty not handled on operation 21 (op 1, counting from 0)
> >>>>>    -14> 2019-04-26 19:23:05.192 7fb2667de700  0 
> >>>>> bluestore(/var/lib/ceph/osd/ceph-2) _dump_transaction transaction dump:
> >>>>> {
> >>>>>     "ops": [
> >>>>>         {
> >>>>>             "op_num": 0,
> >>>>>             "op_name": "remove",
> >>>>>             "collection": "25.0_head",
> >>>>>             "oid": "#25:00000000::::head#"
> >>>>>         },
> >>>>>         {
> >>>>>             "op_num": 1,
> >>>>>             "op_name": "rmcoll",
> >>>>>             "collection": "25.0_head"
> >>>>>         }
> >>>>>
> >>>>>    -13> 2019-04-26 19:23:05.199 7fb2667de700 -1 
> >>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluestore/BlueStore.cc:
> >>>>>  In function 'void 
> >>>>> BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> >>>>> ObjectStore::Transaction*)' thread 7fb2667de700 time 2019-04-26 
> >>>>> 19:23:05.193826
> >>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluestore/BlueStore.cc:
> >>>>>  11069: abort()
> >>>>>
> >>>>>  ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) 
> >>>>> nautilus (stable)
> >>>>>  1: (ceph::__ceph_abort(char const*, int, char const*, std::string 
> >>>>> const&)+0xd8) [0x7c1454ee40]
> >>>>>  2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> >>>>> ObjectStore::Transaction*)+0x2a85) [0x7c14b2d5f5]
> >>>>>  3: 
> >>>>> (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> >>>>>  std::vector<ObjectStore::Transaction, 
> >>>>> std::allocator<ObjectStore::Transaction> >&, 
> >>>>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x526) 
> >>>>> [0x7c14b2e366]
> >>>>>  4: 
> >>>>> (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> >>>>>  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, 
> >>>>> ThreadPool::TPHandle*)+0x7f) [0x7c1470a81f]
> >>>>>  5: (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0x7c1476d70d]
> >>>>>  6: (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) 
> >>>>> [0x7c1476e528]
> >>>>>  7: (boost::statechart::simple_state<PG::RecoveryState::Deleting, 
> >>>>> PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, 
> >>>>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> >>>>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> >>>>> mpl_::na, mpl_::na, mpl_::na, mpl_::na>, 
> >>>>> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> >>>>>  const&, void const*)+0x16a) [0x7c147acc8a]
> >>>>>  8: 
> >>>>> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, 
> >>>>> PG::RecoveryState::Initial, std::allocator<void>, 
> >>>>> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
> >>>>>  const&)+0x5a) [0x7c1478a91a]
> >>>>>  9: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, 
> >>>>> PG::RecoveryCtx*)+0x119) [0x7c14779c99]
> >>>>>  10: (OSD::dequeue_peering_evt(OSDShard*, PG*, 
> >>>>> std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) 
> >>>>> [0x7c146b4494]
> >>>>>  11: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
> >>>>> ThreadPool::TPHandle&)+0x234) [0x7c146b48d4]
> >>>>>  12: (OSD::ShardedOpWQ::_process(unsigned int, 
> >>>>> ceph::heartbeat_handle_d*)+0x9f4) [0x7c146a8c14]
> >>>>>  13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) 
> >>>>> [0x7c14ca0f43]
> >>>>>  14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7c14ca3fe0]
> >>>>>  15: (()+0x7dd5) [0x7fb284e3bdd5]
> >>>>>  16: (clone()+0x6d) [0x7fb283d01ead]
> >>>>>
> >>>>>    -12> 2019-04-26 19:23:05.199 7fb280b24700  1 -- 
> >>>>> [v2:192.168.1.152:6800/1887364,v1:192.168.1.152:6801/1887364] <== 
> >>>>> client.244681 v1:192.168.1.50:0/337348516 6 ==== 
> >>>>> osd_op(client.244681.0:2571786 12.5es0 12.43dfa65e (undecoded) 
> >>>>> ondisk+retry+write+known_if_redirected e7648) v8 ==== 229+0+4194304 
> >>>>> (unknown 2519985655 0 813231621) 0x7c2f3ee840 con 0x7c2f2fac00
> >>>>>    -11> 2019-04-26 19:23:05.199 7fb280b24700 15 osd.2 7648 enqueue_op 
> >>>>> 0x7c300f8420 prio 63 cost 4194304 latency 0.035786 epoch 7648 
> >>>>> osd_op(client.244681.0:2571786 12.5es0 12.43dfa65e (undecoded) 
> >>>>> ondisk+retry+write+known_if_redirected e7648) v8
> >>>>>    -10> 2019-04-26 19:23:05.199 7fb280b24700 20 osd.2 op_wq(4) _enqueue 
> >>>>> OpQueueItem(12.5es0 PGOpItem(op=osd_op(client.244681.0:2571786 12.5es0 
> >>>>> 12.43dfa65e (undecoded) ondisk+retry+write+known_if_redirected e7648) 
> >>>>> v8) prio 63 cost 4194304 e7648)
> >>>>>     -9> 2019-04-26 19:23:05.200 7fb2647da700 20 osd.2 op_wq(4) _process 
> >>>>> 12.5es0 to_process <> waiting <> waiting_peering {}
> >>>>>     -8> 2019-04-26 19:23:05.200 7fb2647da700 20 osd.2 op_wq(4) _process 
> >>>>> OpQueueItem(12.5es0 PGOpItem(op=osd_op(client.244681.0:2571786 12.5es0 
> >>>>> 12.43dfa65e (undecoded) ondisk+retry+write+known_if_redirected e7648) 
> >>>>> v8) prio 63 cost 4194304 e7648) queued
> >>>>>     -7> 2019-04-26 19:23:05.200 7fb2647da700 20 osd.2 op_wq(4) _process 
> >>>>> 12.5es0 to_process <OpQueueItem(12.5es0 
> >>>>> PGOpItem(op=osd_op(client.244681.0:2571786 12.5es0 12.43dfa65e 
> >>>>> (undecoded) ondisk+retry+write+known_if_redirected e7648) v8) prio 63 
> >>>>> cost 4194304 e7648)> waiting <> waiting_peering {}
> >>>>>     -6> 2019-04-26 19:23:05.200 7fb2647da700 20 osd.2 op_wq(4) _process 
> >>>>> OpQueueItem(12.5es0 PGOpItem(op=osd_op(client.244681.0:2571786 12.5es0 
> >>>>> 12.43dfa65e (undecoded) ondisk+retry+write+known_if_redirected e7648) 
> >>>>> v8) prio 63 cost 4194304 e7648) pg 0x7c271c6000
> >>>>>     -5> 2019-04-26 19:23:05.200 7fb2647da700 10 osd.2 7648 dequeue_op 
> >>>>> 0x7c300f8420 prio 63 cost 4194304 latency 0.036428 
> >>>>> osd_op(client.244681.0:2571786 12.5es0 12.43dfa65e (undecoded) 
> >>>>> ondisk+retry+write+known_if_redirected e7648) v8 pg pg[12.5es0( v 
> >>>>> 7592'250137 (5872'250020,7592'250137] local-lis/les=7488/7489 n=22756 
> >>>>> ec=5088/733 lis/c 7488/7488 les/c/f 7489/7489/5043 7648/7648/7648) 
> >>>>> [2,2147483647,0,7,4]p2(0) r=0 lpr=7648 pi=[7488,7648)/1 crt=7592'250135 
> >>>>> lcod 0'0 mlcod 0'0 peering mbc={}]
> >>>>>     -4> 2019-04-26 19:23:05.200 7fb2647da700 20 osd.2 7648 share_map 
> >>>>> client.244681 v1:192.168.1.50:0/337348516 7648
> >>>>>     -3> 2019-04-26 19:23:05.200 7fb2647da700 20 osd.2 7648 
> >>>>> should_share_map client.244681 v1:192.168.1.50:0/337348516 7648
> >>>>>     -2> 2019-04-26 19:23:05.200 7fb2647da700 10 osd.2 7648 dequeue_op 
> >>>>> 0x7c300f8420 finish
> >>>>>     -1> 2019-04-26 19:23:05.200 7fb2647da700 20 osd.2 op_wq(4) _process 
> >>>>> empty q, waiting
> >>>>>      0> 2019-04-26 19:23:05.206 7fb2667de700 -1 *** Caught signal 
> >>>>> (Aborted) **
> >>>>>  in thread 7fb2667de700 thread_name:tp_osd_tp
> >>>>>
> >>>>>
> >>>>> On Fri, Apr 26, 2019 at 3:57 PM Gregory Farnum <gfar...@redhat.com> 
> >>>>> wrote:
> >>>>>>
> >>>>>> You'll probably want to generate a log with "debug osd = 20" and
> >>>>>> "debug bluestore = 20", then share that or upload it with
> >>>>>> ceph-post-file, to get more useful info about which PGs are breaking
> >>>>>> (is it actually the ones that were supposed to delete?).
> >>>>>>
> >>>>>> If there's a particular set of PGs you need to rescue, you can also
> >>>>>> look at using the ceph-objectstore-tool to export them off the busted
> >>>>>> OSD stores and import them into OSDs that still work.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Apr 26, 2019 at 12:01 PM Elise Burke <elise.n...@gmail.com> 
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I upgraded to Nautilus a week or two ago and things had been mostly 
> >>>>>>> fine. I was interested in trying the device health stats feature and 
> >>>>>>> enabled it. In doing so it created a pool, device_health_metrics, 
> >>>>>>> which contained zero bytes.
> >>>>>>>
> >>>>>>> Unfortunately this pool developed a PG that could not be repaired 
> >>>>>>> with `ceph pg repair`. That's okay, I thought, this pool is empty 
> >>>>>>> (zero bytes), so I'll just remove it and discard the PG entirely.
> >>>>>>>
> >>>>>>> So I did: `ceph osd pool rm device_health_metrics 
> >>>>>>> device_health_metrics --yes-i-really-really-mean-it`
> >>>>>>>
> >>>>>>> Within a few seconds three OSDs had gone missing (this pool was 
> >>>>>>> size=3) and now crashloop at startup.
> >>>>>>>
> >>>>>>> Any assistance in getting these OSDs up (such as by discarding the 
> >>>>>>> errant PG) would be appreciated. I'm most concerned about the other 
> >>>>>>> pools in the system, as losing three OSDs at once has not been ideal.
> >>>>>>>
> >>>>>>> This is made more difficult as these are in the Bluestore 
> >>>>>>> configuration and were set up with ceph-deploy to bare metal (using 
> >>>>>>> LVM mode).
> >>>>>>>
> >>>>>>> Here's the traceback as noted in journalctl:
> >>>>>>>
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 
> >>>>>>> 11:01:08.902 7f8a00866d80 -1 Falling back to public interface
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 
> >>>>>>> 11:01:41.835 7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true}
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 
> >>>>>>> 11:01:43.203 7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) 
> >>>>>>> _txc_add_transaction error (39) Directory not empty not handled on 
> >>>>>>> operation 21 (op 1, counting from 0)
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 
> >>>>>>> 11:01:43.209 7f89dee53700 -1 
> >>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 
> >>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 
> >>>>>>> (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 1: 
> >>>>>>> (ceph::__ceph_abort(char const*, int, char const*, std::string 
> >>>>>>> const&)+0xd8) [0xfc63afe40]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 2: 
> >>>>>>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> >>>>>>> ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 3: 
> >>>>>>> (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> >>>>>>>  std::vector<ObjectStore::Transaction, 
> >>>>>>> std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 4: 
> >>>>>>> (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> >>>>>>>  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, 
> >>>>>>> ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 5: 
> >>>>>>> (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 6: 
> >>>>>>> (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) 
> >>>>>>> [0xfc65cf528]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 7: 
> >>>>>>> (boost::statechart::simple_state<PG::RecoveryState::Deleting, 
> >>>>>>> PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, 
> >>>>>>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 8: 
> >>>>>>> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, 
> >>>>>>> PG::RecoveryState::Initial, std::allocator<void>, 
> >>>>>>> boost::statechart::null_exception_translator>::process_event(boost
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 9: 
> >>>>>>> (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, 
> >>>>>>> PG::RecoveryCtx*)+0x119) [0xfc65dac99]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 10: 
> >>>>>>> (OSD::dequeue_peering_evt(OSDShard*, PG*, 
> >>>>>>> std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) 
> >>>>>>> [0xfc6515494]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 11: 
> >>>>>>> (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
> >>>>>>> ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 12: 
> >>>>>>> (OSD::ShardedOpWQ::_process(unsigned int, 
> >>>>>>> ceph::heartbeat_handle_d*)+0x9f4) [0xfc6509c14]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 13: 
> >>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) 
> >>>>>>> [0xfc6b01f43]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 14: 
> >>>>>>> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) 
> >>>>>>> [0x7f89fd4b0dd5]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d) 
> >>>>>>> [0x7f89fc376ead]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 0> 2019-04-26 11:01:43.217 
> >>>>>>> 7f89dee53700 -1 *** Caught signal (Aborted) **
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: in thread 7f89dee53700 
> >>>>>>> thread_name:tp_osd_tp
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 
> >>>>>>> (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (()+0xf5d0) 
> >>>>>>> [0x7f89fd4b85d0]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 2: (gsignal()+0x37) 
> >>>>>>> [0x7f89fc2af207]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 3: (abort()+0x148) 
> >>>>>>> [0x7f89fc2b08f8]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 4: 
> >>>>>>> (ceph::__ceph_abort(char const*, int, char const*, std::string 
> >>>>>>> const&)+0x19c) [0xfc63aff04]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 5: 
> >>>>>>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> >>>>>>> ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 6: 
> >>>>>>> (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> >>>>>>>  std::vector<ObjectStore::Transaction, 
> >>>>>>> std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 7: 
> >>>>>>> (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
> >>>>>>>  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, 
> >>>>>>> ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 8: 
> >>>>>>> (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 9: 
> >>>>>>> (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) 
> >>>>>>> [0xfc65cf528]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 10: 
> >>>>>>> (boost::statechart::simple_state<PG::RecoveryState::Deleting, 
> >>>>>>> PG::RecoveryState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, 
> >>>>>>> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::n
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 11: 
> >>>>>>> (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, 
> >>>>>>> PG::RecoveryState::Initial, std::allocator<void>, 
> >>>>>>> boost::statechart::null_exception_translator>::process_event(boos
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 12: 
> >>>>>>> (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, 
> >>>>>>> PG::RecoveryCtx*)+0x119) [0xfc65dac99]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 13: 
> >>>>>>> (OSD::dequeue_peering_evt(OSDShard*, PG*, 
> >>>>>>> std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1b4) 
> >>>>>>> [0xfc6515494]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 14: 
> >>>>>>> (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
> >>>>>>> ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 15: 
> >>>>>>> (OSD::ShardedOpWQ::_process(unsigned int, 
> >>>>>>> ceph::heartbeat_handle_d*)+0x9f4) [0xfc6509c14]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 16: 
> >>>>>>> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) 
> >>>>>>> [0xfc6b01f43]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 17: 
> >>>>>>> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 18: (()+0x7dd5) 
> >>>>>>> [0x7f89fd4b0dd5]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: 19: (clone()+0x6d) 
> >>>>>>> [0x7f89fc376ead]
> >>>>>>> Apr 26 11:01:43 databox ceph-osd[1878533]: NOTE: a copy of the 
> >>>>>>> executable, or `objdump -rdS <executable>` is needed to interpret 
> >>>>>>> this.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> -Elise
> >>>>>>> _______________________________________________
> >>>>>>> ceph-users mailing list
> >>>>>>> ceph-users@lists.ceph.com
> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> _________________________________________
> D i e t m a r  R i e d e r, Mag.Dr.
> Innsbruck Medical University
> Biocenter - Division for Bioinformatics
> Innrain 80, 6020 Innsbruck
> Phone: +43 512 9003 71402
> Fax: +43 512 9003 73100
> Email: dietmar.rie...@i-med.ac.at
> Web:   http://www.icbi.at
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to