On Mon, Oct 23, 2017 at 4:51 PM, [email protected] < [email protected]> wrote:
> Hello, > Le 23/10/2017 à 02:05, Brad Hubbard a écrit : > > 2017-10-22 17:32:56.031086 7f3acaff5700 1 osd.14 pg_epoch: 72024 > pg[37.1c( v 71593'41657 (60849'38594,71593'41657] local-les=72023 n=13 > ec=7037 les/c/f 72023/72023/66447 72022/72022/72022) [14,1,41] r=0 > lpr=72022 crt=71593'41657 lcod 0' > 0 mlcod 0'0 active+clean] hit_set_trim > 37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31 > 01%3a03%3a24.697717Z_2017-08-31 01%3a52%3a34.767197Z:head not found > 2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc: In > function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::OpContextUPtr&, > unsigned int)' thread 7f3acaff5700 time 2017-10-22 17:32:56.031105 > osd/ReplicatedPG.cc: 11782: FAILED assert(obc) > > It appears to be looking for (and failing to find) a hitset object with a > timestamp from August? Does that sound right to you? Of course, it appears > an object for that timestamp does not exist. > > How is-it possible ? How to fix it. I am sure, if I run a lot of read, > other objects like this will crash other osd. > (Cluster is OK now, I will probably destroy OSD 14 and recreate it). > How to find this object ? > You should be able to do a find on the OSDs filestore and grep the output for 'hit_set_37.1c_archive_2017-08-31'. I'd start with the OSDs responsible for pg 37.1c and then move on to the others if it's feasible. Let us know the results. > For information : All ceph server are NTP time synchrone. > > What are the settings for this cache tier? > > > Just Tier in "backwrite" on erasure pool 2+1. > > # ceph osd pool get cache-nvme-data all > size: 3 > min_size: 2 > crash_replay_interval: 0 > pg_num: 512 > pgp_num: 512 > crush_ruleset: 10 > hashpspool: true > nodelete: false > nopgchange: false > nosizechange: false > write_fadvise_dontneed: false > noscrub: false > nodeep-scrub: false > hit_set_type: bloom > hit_set_period: 14400 > hit_set_count: 12 > hit_set_fpp: 0.05 > use_gmt_hitset: 1 > auid: 0 > target_max_objects: 1000000 > target_max_bytes: 100000000000 > cache_target_dirty_ratio: 0.4 > cache_target_dirty_high_ratio: 0.6 > cache_target_full_ratio: 0.8 > cache_min_flush_age: 600 > cache_min_evict_age: 1800 > min_read_recency_for_promote: 1 > min_write_recency_for_promote: 1 > fast_read: 0 > hit_set_grade_decay_rate: 0 > hit_set_search_last_n: 0 > > # ceph osd pool get raid-2-1-data all > size: 3 > min_size: 2 > crash_replay_interval: 0 > pg_num: 1024 > pgp_num: 1024 > crush_ruleset: 8 > hashpspool: true > nodelete: false > nopgchange: false > nosizechange: false > write_fadvise_dontneed: false > noscrub: false > nodeep-scrub: false > use_gmt_hitset: 1 > auid: 0 > erasure_code_profile: raid-2-1 > min_write_recency_for_promote: 0 > fast_read: 0 > > # ceph osd erasure-code-profile get raid-2-1 > jerasure-per-chunk-alignment=false > k=2 > m=1 > plugin=jerasure > ruleset-failure-domain=host > ruleset-root=default > technique=reed_sol_van > w=8 > > Could you check your logs for any errors from the 'agent_load_hit_sets' > function? > > > join log : # pdsh -R exec -w ceph-osd-01,ceph-osd-02,ceph-osd-03,ceph-osd-04 > ssh -x %h 'zgrep -B10 -A10 agent_load_hit_sets > /var/log/ceph/ceph-osd.*gz'|less > log_agent_load_hit_sets.log > > On 19 October, I restarted on morning OSD 14. > > thanks for your help. > > regards, > > > On Mon, Oct 23, 2017 at 2:41 AM, [email protected] < > [email protected]> wrote: > >> Hello, >> >> I ran today a lot read IO with an simple rsync... and again, an OSD >> crashed : >> >> But as before, I can't restart OSD. It continue crashing again. So OSD is >> out, cluster is recovering. >> >> I had just time to increase OSD log. >> >> # ceph tell osd.14 injectargs --debug-osd 5/5 >> >> Join log : >> >> # grep -B100 -100 objdump /var/log/ceph/ceph-osd.14.log >> >> If I ran another read, an other OSD willl probably crash. >> >> Any Idee ? >> >> I will probably plan to move data from erasure pool to replicat 3x pool. >> It's becoming unstable without any change. >> >> Regards, >> >> PS: Last sunday, I lost RBD header during remove of cache tier... a lot >> of thanks to http://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recove >> ry/, to recreate it and resurrect RBD disk :) >> Le 19/10/2017 à 00:19, Brad Hubbard a écrit : >> >> On Wed, Oct 18, 2017 at 11:16 PM, >> [email protected]<[email protected]> >> <[email protected]> wrote: >> >> hello, >> >> For 2 week, I lost sometime some OSD : >> Here trace : >> >> 0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In >> function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&, >> unsigned int)' thread 7f7c1e497700 time 2017-10-18 05:16:40.869962 >> osd/ReplicatedPG.cc: 11782: FAILED assert(obc) >> >> Can you try to capture a log with debug_osd set to 10 or greater as >> per http://tracker.ceph.com/issues/19185 ? >> >> This will allow us to see the output from the >> PrimaryLogPG::get_object_context() function which may help identify >> the problem. >> >> Please also check your machines all have the same time zone set and >> their clocks are in sync. >> >> >> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x85) [0x55eec15a09e5] >> 2: (ReplicatedPG::hit_set_trim(std::unique_ptr<ReplicatedPG::OpContext, >> std::default_delete<ReplicatedPG::OpContext> >&, unsigned int)+0x6dd) >> [0x55eec107a52d] >> 3: (ReplicatedPG::hit_set_persist()+0xd7c) [0x55eec107d1bc] >> 4: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1a92) >> [0x55eec109bbe2] >> 5: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x747) [0x55eec10588a7] >> 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, >> ThreadPool::TPHandle&)+0x41d) [0x55eec0f0bbad] >> 7: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) >> [0x55eec0f0bdfd] >> 8: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x77b) [0x55eec0f0f7db] >> 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) >> [0x55eec1590987] >> 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55eec15928f0] >> 11: (()+0x7e25) [0x7f7c4fd52e25] >> 12: (clone()+0x6d) [0x7f7c4e3dc34d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> I am using Jewel 10.2.10 >> >> I am using Erasure coding pool (2+1) + Nvme cache tier (backwrite) with 3 >> replica with simple RBD disk. >> (12 OSD Sata disk on 4 nodes + 1 nvme on each node = 48 x OSD sata + 8 x >> NVMe Osd (I split NVMe in 2). >> Last week, it was only nvme OSD which crashed. So I unmap all disk, detroyed >> cache and recreated It. >> From this days, it work fine. Today, an OSD crahed. But it was not an NVME >> OSD this time, a normal OSD (sata). >> >> Any idee ? what about this void "*ReplicatedPG::hit_set_trim". >> >> *thanks for your help,* >> * >> Regards, >> >> >> >> >> >> _______________________________________________ >> ceph-users mailing >> [email protected]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > > -- Cheers, Brad
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
