Re: [ceph-users] ceph pgs state forever stale+active+clean

Hyun Ha Mon, 04 Sep 2017 18:24:05 -0700

Hi,
I'm still having trouble with above issue.
Is anybody there who have same issue or resolve this?


Thanks.



2017-08-21 22:51 GMT+09:00 Hyun Ha <[email protected]>:

> Thanks for response.
>
> I can understand why size of 2 and min_size 1 is not an acceptable in
> production.
> but, I just want to make the situation of data loss and to know the health
> of ceph cluster can be clean in this situation(except data recovery because
> data is gone).
> So, I've tried to delete PGs using command like "ceph pg 2.67 
> mark_unfound_lost
> delete"
> The result was "Error ENOENT: i don't have pgid 2.67" .
>
> So, I confused because I couldn't find the method to delete pg in this
> situation and to make health_ok of ceph cluster.
>
> The only method that I found to make health_ok is this:
> (when the status of pgs in "stale+active+clean", primary/secondary osd
> osd.2, osd.6 is gone at the same time.)
> 1. "ceph pg 2.67 mark_unfound_lost delete" doesn't work
> 2. ceph osd crush rm osd.2, osd.6
> 3. ceph osd rm osd.2, osd.6
> 4. ceph osd auth del osd.2, osd.6
> 5. ceph osd lost 2, 6 --yes-i-really-mean-it
> 6. ceph pg force_create_pg 2.67 (but, in this time the status of pg 2.67
> is creating..forever)
> 7. re-deploy osd.2, osd.6
> 8. stop coresponding osd(primary/secondary osd sequentially for pg 2.67)
> and wait to complete recovery and start
>  - in this time the primary/secondary osd for pg is not osd.2, osd.6
> because pg map is re-created. so I found what is the primary/secondary osd
> for pg 2.67
> 10. creation of pg 2.67 is done(creating -> peering -> remapped ->
> active+clean) and ceph become health_ok
>
> ceph cluster become health_ok eventually, but in this time there was a
> problem that rbd can not found rbd images like below.
> # rbd ls -p volumes
> hhvol01
> # rbd info volumes/hhvol01
> rbd: error opening image hhvol01: (2) No such file or directory
>
> something wrong with rbd image but I cannot understand why this happening
> is occurred.
>
> shortly, my question is that:
> 1. Is there any CLI to delete stuck PGs?
> 2. what is correct steps to make health_ok in this situation?
>
> Thanks.
>
> 2017-08-21 20:37 GMT+09:00 David Turner <[email protected]>:
>
>> With the exception of trying to re-add the drive to be able to read the
>> data off of it, your only other option is to accept that you lost data and
>> mark the pg as lost and delete it. Not surprisingly, you can't recover the
>> data without any copies of it. Size of 2 is not an acceptable production
>> seeing if data integrity is a priority. Min_size of 1 is even worse. There
>> is plenty of talk about why in the ML archives.
>>
>> So if you're hoping to not lose data, then your only option is to try and
>> read the data off of the removed osds. If your goal is health_ok regardless
>> of data integrity, then your option is to delete the PGs.
>>
>> On Mon, Aug 21, 2017, 1:07 AM Hyun Ha <[email protected]> wrote:
>>
>>> Hi, Thank you for response.
>>>
>>> Details of my pool is below:
>>> pool 2 'volumes' replicated size 2 min_size 1 crush_ruleset 0
>>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 627 flags
>>> hashpspool stripe_width 0
>>>         removed_snaps [1~3]
>>>
>>> My test case was about a scenario of disaster. I think that the
>>> situation that all copies of data is deleted can be occurred in production
>>> (in my test, I deleted all copy of data by myself because simulate disaster
>>> state).
>>>
>>> When all copy of data is deleted, ceph cluster never get back to clean.
>>> How can I recover in this situation?
>>>
>>> Thank you.
>>>
>>>
>>> 2017-08-18 21:28 GMT+09:00 David Turner <[email protected]>:
>>>
>>>> What were the settings for your pool? What was the size?  It looks like
>>>> the size was 2 and that the PGs only existed on osds 2 and 6. If that's the
>>>> case, it's like having a 4 disk raid 1+0, removing 2 disks of the same
>>>> mirror, and complaining that the other mirror didn't pick up the data...
>>>> Don't delete all copies of your data.  If your replica size is 2, you
>>>> cannot loose 2 disks at the same time.
>>>>
>>>> On Fri, Aug 18, 2017, 1:28 AM Hyun Ha <[email protected]> wrote:
>>>>
>>>>> Hi, Cephers!
>>>>>
>>>>> I'm currently testing the situation of double failure for ceph cluster.
>>>>> But, I faced that pgs are in stale state forever.
>>>>>
>>>>> reproduce steps)
>>>>> 0. ceph version : jewel 10.2.3 (ecc23778eb545d8dd55e2e4735b53
>>>>> cc93f92e65b)
>>>>> 1. Pool create : exp-volumes (size = 2, min_size = 1)
>>>>> 2. rbd create : testvol01
>>>>> 3. rbd map and create mkfs.xfs
>>>>> 4. mount and create file
>>>>> 5. list rados object
>>>>> 6. check osd map of each object
>>>>>  # ceph osd map exp-volumes rbd_data.4a41f238e1f29.000000000000017a
>>>>>    osdmap e199 pool 'exp-volumes' (2) object
>>>>> 'rbd_data.4a41f238e1f29.000000000000017a' -> pg 2.3f04d6e2 (2.62) ->
>>>>> up ([2,6], p2) acting ([2,6], p2)
>>>>> 7. stop primary osd.2 and secondary osd.6 of above object at the same
>>>>> time
>>>>> 8. check ceph status
>>>>> health HEALTH_ERR
>>>>>             16 pgs are stuck inactive for more than 300 seconds
>>>>>             16 pgs stale
>>>>>             16 pgs stuck stale
>>>>>      monmap e11: 3 mons at {10.105.176.85=10.105.176.85:6
>>>>> 789/0,10.110.248.154=10.110.248.154:6789/0,10.110.249.153=10
>>>>> .110.249.153:6789/0}
>>>>>             election epoch 84, quorum 0,1,2
>>>>> 10.105.176.85,10.110.248.154,10.110.249.153
>>>>>      osdmap e248: 6 osds: 4 up, 4 in; 16 remapped pgs
>>>>>             flags sortbitwise,require_jewel_osds
>>>>>       pgmap v112095: 128 pgs, 1 pools, 14659 kB data, 17 objects
>>>>>             165 MB used, 159 GB / 160 GB avail
>>>>>                  112 active+clean
>>>>>                   16 stale+active+clean
>>>>>
>>>>> # ceph health detail
>>>>> HEALTH_ERR 16 pgs are stuck inactive for more than 300 seconds; 16 pgs
>>>>> stale; 16 pgs stuck stale
>>>>> pg 2.67 is stuck stale for 689.171742, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.5a is stuck stale for 689.171748, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>> pg 2.52 is stuck stale for 689.171753, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.4d is stuck stale for 689.171757, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.56 is stuck stale for 689.171755, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>> pg 2.d is stuck stale for 689.171811, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>> pg 2.79 is stuck stale for 689.171808, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.1f is stuck stale for 689.171782, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>> pg 2.76 is stuck stale for 689.171809, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>> pg 2.17 is stuck stale for 689.171794, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>> pg 2.63 is stuck stale for 689.171794, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.77 is stuck stale for 689.171816, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.1b is stuck stale for 689.171793, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>> pg 2.62 is stuck stale for 689.171765, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.30 is stuck stale for 689.171799, current state
>>>>> stale+active+clean, last acting [2,6]
>>>>> pg 2.19 is stuck stale for 689.171798, current state
>>>>> stale+active+clean, last acting [6,2]
>>>>>
>>>>>  # ceph pg dump_stuck stale
>>>>> ok
>>>>> pg_stat state   up      up_primary      acting  acting_primary
>>>>> 2.67    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.5a    stale+active+clean      [6,2]   6       [6,2]   6
>>>>> 2.52    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.4d    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.56    stale+active+clean      [6,2]   6       [6,2]   6
>>>>> 2.d     stale+active+clean      [6,2]   6       [6,2]   6
>>>>> 2.79    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.1f    stale+active+clean      [6,2]   6       [6,2]   6
>>>>> 2.76    stale+active+clean      [6,2]   6       [6,2]   6
>>>>> 2.17    stale+active+clean      [6,2]   6       [6,2]   6
>>>>> 2.63    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.77    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.1b    stale+active+clean      [6,2]   6       [6,2]   6
>>>>> 2.62    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.30    stale+active+clean      [2,6]   2       [2,6]   2
>>>>> 2.19    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>
>>>>> # ceph pg 2.62 query
>>>>> Error ENOENT: i don't have pgid 2.62
>>>>>
>>>>>  # rados ls -p exp-volumes
>>>>> rbd_data.4a41f238e1f29.000000000000003f
>>>>> ^C --> hang
>>>>>
>>>>> I understand that this is a natural result becasue above pgs have no
>>>>> primary and seconary osd. But this situation can be occurred so, I want to
>>>>> recover ceph cluster and rbd images.
>>>>>
>>>>> Firstly I want to know how to make ceph cluster's state clean.
>>>>> I read document and try to solve this but nothing can help including
>>>>> below commands.
>>>>>  - ceph pg force_create_pg 2.6
>>>>>  - ceph osd lost 2 --yes-i-really-mean-it
>>>>>  - ceph osd lost 6 --yes-i-really-mean-it
>>>>>  - ceph osd crush rm osd.2
>>>>>  - ceph osd crush rm osd.6
>>>>>  - cpeh osd rm osd.2
>>>>>  - ceph osd rm osd.6
>>>>>
>>>>> Is there any command to force delete pgs or make ceph cluster clean ?
>>>>> Thank you in advance.
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> [email protected]
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph pgs state forever stale+active+clean

Reply via email to