Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

Brad Hubbard Mon, 23 Oct 2017 22:49:58 -0700

On Mon, Oct 23, 2017 at 4:51 PM, [email protected] <
[email protected]> wrote:


> Hello,
> Le 23/10/2017 à 02:05, Brad Hubbard a écrit :
>
> 2017-10-22 17:32:56.031086 7f3acaff5700  1 osd.14 pg_epoch: 72024
> pg[37.1c( v 71593'41657 (60849'38594,71593'41657] local-les=72023 n=13
> ec=7037 les/c/f 72023/72023/66447 72022/72022/72022) [14,1,41] r=0
> lpr=72022 crt=71593'41657 lcod 0'
> 0 mlcod 0'0 active+clean] hit_set_trim 
> 37:38000000:.ceph-internal::hit_set_37.1c_archive_2017-08-31
> 01%3a03%3a24.697717Z_2017-08-31 01%3a52%3a34.767197Z:head not found
> 2017-10-22 17:32:56.033936 7f3acaff5700 -1 osd/ReplicatedPG.cc: In
> function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::OpContextUPtr&,
> unsigned int)' thread 7f3acaff5700 time 2017-10-22 17:32:56.031105
> osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
>
> It appears to be looking for (and failing to find) a hitset object with a
> timestamp from August? Does that sound right to you? Of course, it appears
> an object for that timestamp does not exist.
>
> How is-it possible ? How to fix it. I am sure, if I run a lot of read,
> other objects like this will crash other osd.
> (Cluster is OK now, I will probably destroy OSD 14 and recreate it).
> How to find this object ?
>

You should be able to do a find on the OSDs filestore and grep the output
for 'hit_set_37.1c_archive_2017-08-31'. I'd start with the OSDs responsible
for pg 37.1c and then move on to the others if it's feasible.

Let us know the results.


> For information : All ceph server are NTP time synchrone.
>
> What are the settings for this cache tier?
>
>
> Just Tier in "backwrite" on erasure pool 2+1.
>
> # ceph osd pool get cache-nvme-data all
> size: 3
> min_size: 2
> crash_replay_interval: 0
> pg_num: 512
> pgp_num: 512
> crush_ruleset: 10
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> hit_set_type: bloom
> hit_set_period: 14400
> hit_set_count: 12
> hit_set_fpp: 0.05
> use_gmt_hitset: 1
> auid: 0
> target_max_objects: 1000000
> target_max_bytes: 100000000000
> cache_target_dirty_ratio: 0.4
> cache_target_dirty_high_ratio: 0.6
> cache_target_full_ratio: 0.8
> cache_min_flush_age: 600
> cache_min_evict_age: 1800
> min_read_recency_for_promote: 1
> min_write_recency_for_promote: 1
> fast_read: 0
> hit_set_grade_decay_rate: 0
> hit_set_search_last_n: 0
>
> #  ceph osd pool get raid-2-1-data all
> size: 3
> min_size: 2
> crash_replay_interval: 0
> pg_num: 1024
> pgp_num: 1024
> crush_ruleset: 8
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> auid: 0
> erasure_code_profile: raid-2-1
> min_write_recency_for_promote: 0
> fast_read: 0
>
> # ceph osd erasure-code-profile get raid-2-1
> jerasure-per-chunk-alignment=false
> k=2
> m=1
> plugin=jerasure
> ruleset-failure-domain=host
> ruleset-root=default
> technique=reed_sol_van
> w=8
>
> Could you check your logs for any errors from the 'agent_load_hit_sets'
> function?
>
>
> join log : #  pdsh -R exec -w ceph-osd-01,ceph-osd-02,ceph-osd-03,ceph-osd-04
> ssh -x  %h 'zgrep -B10 -A10 agent_load_hit_sets
> /var/log/ceph/ceph-osd.*gz'|less > log_agent_load_hit_sets.log
>
> On 19 October, I restarted on morning OSD 14.
>
> thanks for your help.
>
> regards,
>
>
> On Mon, Oct 23, 2017 at 2:41 AM, [email protected] <
> [email protected]> wrote:
>
>> Hello,
>>
>> I ran today a lot read IO with an simple rsync... and again, an OSD
>> crashed :
>>
>> But as before, I can't restart OSD. It continue crashing again. So OSD is
>> out, cluster is recovering.
>>
>> I had just time to increase OSD log.
>>
>> # ceph tell osd.14 injectargs --debug-osd 5/5
>>
>> Join log :
>>
>> # grep -B100 -100 objdump /var/log/ceph/ceph-osd.14.log
>>
>> If I ran another read, an other OSD willl probably crash.
>>
>> Any Idee ?
>>
>> I will probably plan to move data from erasure pool to replicat 3x pool.
>> It's becoming unstable without any change.
>>
>> Regards,
>>
>> PS: Last sunday, I lost RBD header during remove of cache tier... a lot
>> of thanks to http://fnordahl.com/2017/04/17/ceph-rbd-volume-header-recove
>> ry/, to recreate it and resurrect RBD disk :)
>> Le 19/10/2017 à 00:19, Brad Hubbard a écrit :
>>
>> On Wed, Oct 18, 2017 at 11:16 PM, 
>> [email protected]<[email protected]> 
>> <[email protected]> wrote:
>>
>> hello,
>>
>> For 2 week, I lost sometime some OSD :
>> Here trace :
>>
>>     0> 2017-10-18 05:16:40.873511 7f7c1e497700 -1 osd/ReplicatedPG.cc: In
>> function '*void ReplicatedPG::hit_set_trim(*ReplicatedPG::OpContextUPtr&,
>> unsigned int)' thread 7f7c1e497700 time 2017-10-18 05:16:40.869962
>> osd/ReplicatedPG.cc: 11782: FAILED assert(obc)
>>
>> Can you try to capture a log with debug_osd set to 10 or greater as
>> per http://tracker.ceph.com/issues/19185 ?
>>
>> This will allow us to see the output from the
>> PrimaryLogPG::get_object_context() function which may help identify
>> the problem.
>>
>> Please also check your machines all have the same time zone set and
>> their clocks are in sync.
>>
>>
>>  ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x85) [0x55eec15a09e5]
>>  2: (ReplicatedPG::hit_set_trim(std::unique_ptr<ReplicatedPG::OpContext,
>> std::default_delete<ReplicatedPG::OpContext> >&, unsigned int)+0x6dd)
>> [0x55eec107a52d]
>>  3: (ReplicatedPG::hit_set_persist()+0xd7c) [0x55eec107d1bc]
>>  4: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1a92)
>> [0x55eec109bbe2]
>>  5: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x747) [0x55eec10588a7]
>>  6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>,
>> ThreadPool::TPHandle&)+0x41d) [0x55eec0f0bbad]
>>  7: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d)
>> [0x55eec0f0bdfd]
>>  8: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x77b) [0x55eec0f0f7db]
>>  9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887)
>> [0x55eec1590987]
>>  10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55eec15928f0]
>>  11: (()+0x7e25) [0x7f7c4fd52e25]
>>  12: (clone()+0x6d) [0x7f7c4e3dc34d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>>
>> I am using Jewel 10.2.10
>>
>> I am using Erasure coding pool (2+1) + Nvme cache tier (backwrite) with 3
>> replica with simple RBD disk.
>> (12 OSD Sata disk on 4 nodes + 1 nvme on each node = 48 x OSD sata + 8 x
>> NVMe Osd (I split NVMe in 2).
>> Last week, it was only nvme OSD which crashed. So I unmap all disk, detroyed
>> cache and recreated It.
>> From this days, it work fine. Today, an OSD crahed. But it was not an NVME
>> OSD this time, a normal OSD (sata).
>>
>> Any idee ? what about this void "*ReplicatedPG::hit_set_trim".
>>
>> *thanks for your help,*
>> *
>> Regards,
>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing 
>> [email protected]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
>


-- 
Cheers,
Brad

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Jewel] Crash Osd with void Hit_set_trim

Reply via email to