Re: [ceph-users] ceph pgs state forever stale+active+clean

David Turner Mon, 04 Sep 2017 18:36:08 -0700

I don't know that it's still clear what you're asking for.  You're
understanding that this scenario is going to have lost data that you cannot
get back, correct?  Some of the information for the RBD might have been in
the PGs that you no longer have any copy of.  Any RBD that has objects that
are now completely gone due to this will be like a harddrive with dead
sectors randomly on it.  You can't guarantee any of the data on the disk is
good, but you might be able to pull some data off of it after the cluster
is back to "health_ok" with data loss.


Until you recover the cluster to a "healthy" state, stop worrying that you
have downtime... you have critical hardware failure and data loss.  If this
was in production, you should be assessing how many RBD's have lost data,
how you're going to try and recover what data you can/need to, and figure
out where to go from there... downtime is likely the least of your worries
at this point.  Being up is useless because the data can't be trusted or
used.

On Mon, Sep 4, 2017 at 9:23 PM Hyun Ha <[email protected]> wrote:

> Hi,
> I'm still having trouble with above issue.
> Is anybody there who have same issue or resolve this?
>
> Thanks.
>
>
>
> 2017-08-21 22:51 GMT+09:00 Hyun Ha <[email protected]>:
>
>> Thanks for response.
>>
>> I can understand why size of 2 and min_size 1 is not an acceptable in
>> production.
>> but, I just want to make the situation of data loss and to know the
>> health of ceph cluster can be clean in this situation(except data recovery
>> because data is gone).
>> So, I've tried to delete PGs using command like "ceph pg 2.67 
>> mark_unfound_lost
>> delete"
>> The result was "Error ENOENT: i don't have pgid 2.67" .
>>
>> So, I confused because I couldn't find the method to delete pg in this
>> situation and to make health_ok of ceph cluster.
>>
>> The only method that I found to make health_ok is this:
>> (when the status of pgs in "stale+active+clean", primary/secondary osd
>> osd.2, osd.6 is gone at the same time.)
>> 1. "ceph pg 2.67 mark_unfound_lost delete" doesn't work
>> 2. ceph osd crush rm osd.2, osd.6
>> 3. ceph osd rm osd.2, osd.6
>> 4. ceph osd auth del osd.2, osd.6
>> 5. ceph osd lost 2, 6 --yes-i-really-mean-it
>> 6. ceph pg force_create_pg 2.67 (but, in this time the status of pg 2.67
>> is creating..forever)
>> 7. re-deploy osd.2, osd.6
>> 8. stop coresponding osd(primary/secondary osd sequentially for pg 2.67)
>> and wait to complete recovery and start
>>  - in this time the primary/secondary osd for pg is not osd.2, osd.6
>> because pg map is re-created. so I found what is the primary/secondary osd
>> for pg 2.67
>> 10. creation of pg 2.67 is done(creating -> peering -> remapped ->
>> active+clean) and ceph become health_ok
>>
>> ceph cluster become health_ok eventually, but in this time there was a
>> problem that rbd can not found rbd images like below.
>> # rbd ls -p volumes
>> hhvol01
>> # rbd info volumes/hhvol01
>> rbd: error opening image hhvol01: (2) No such file or directory
>>
>> something wrong with rbd image but I cannot understand why this happening
>> is occurred.
>>
>> shortly, my question is that:
>> 1. Is there any CLI to delete stuck PGs?
>> 2. what is correct steps to make health_ok in this situation?
>>
>> Thanks.
>>
>> 2017-08-21 20:37 GMT+09:00 David Turner <[email protected]>:
>>
>>> With the exception of trying to re-add the drive to be able to read the
>>> data off of it, your only other option is to accept that you lost data and
>>> mark the pg as lost and delete it. Not surprisingly, you can't recover the
>>> data without any copies of it. Size of 2 is not an acceptable production
>>> seeing if data integrity is a priority. Min_size of 1 is even worse. There
>>> is plenty of talk about why in the ML archives.
>>>
>>> So if you're hoping to not lose data, then your only option is to try
>>> and read the data off of the removed osds. If your goal is health_ok
>>> regardless of data integrity, then your option is to delete the PGs.
>>>
>>> On Mon, Aug 21, 2017, 1:07 AM Hyun Ha <[email protected]> wrote:
>>>
>>>> Hi, Thank you for response.
>>>>
>>>> Details of my pool is below:
>>>> pool 2 'volumes' replicated size 2 min_size 1 crush_ruleset 0
>>>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 627 flags
>>>> hashpspool stripe_width 0
>>>>         removed_snaps [1~3]
>>>>
>>>> My test case was about a scenario of disaster. I think that the
>>>> situation that all copies of data is deleted can be occurred in production
>>>> (in my test, I deleted all copy of data by myself because simulate disaster
>>>> state).
>>>>
>>>> When all copy of data is deleted, ceph cluster never get back to clean.
>>>> How can I recover in this situation?
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> 2017-08-18 21:28 GMT+09:00 David Turner <[email protected]>:
>>>>
>>>>> What were the settings for your pool? What was the size?  It looks
>>>>> like the size was 2 and that the PGs only existed on osds 2 and 6. If
>>>>> that's the case, it's like having a 4 disk raid 1+0, removing 2 disks of
>>>>> the same mirror, and complaining that the other mirror didn't pick up the
>>>>> data... Don't delete all copies of your data.  If your replica size is 2,
>>>>> you cannot loose 2 disks at the same time.
>>>>>
>>>>> On Fri, Aug 18, 2017, 1:28 AM Hyun Ha <[email protected]> wrote:
>>>>>
>>>>>> Hi, Cephers!
>>>>>>
>>>>>> I'm currently testing the situation of double failure for ceph
>>>>>> cluster.
>>>>>> But, I faced that pgs are in stale state forever.
>>>>>>
>>>>>> reproduce steps)
>>>>>> 0. ceph version : jewel 10.2.3
>>>>>> (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>>>>>> 1. Pool create : exp-volumes (size = 2, min_size = 1)
>>>>>> 2. rbd create : testvol01
>>>>>> 3. rbd map and create mkfs.xfs
>>>>>> 4. mount and create file
>>>>>> 5. list rados object
>>>>>> 6. check osd map of each object
>>>>>>  # ceph osd map exp-volumes rbd_data.4a41f238e1f29.000000000000017a
>>>>>>    osdmap e199 pool 'exp-volumes' (2) object
>>>>>> 'rbd_data.4a41f238e1f29.000000000000017a' -> pg 2.3f04d6e2 (2.62) -> up
>>>>>> ([2,6], p2) acting ([2,6], p2)
>>>>>> 7. stop primary osd.2 and secondary osd.6 of above object at the same
>>>>>> time
>>>>>> 8. check ceph status
>>>>>> health HEALTH_ERR
>>>>>>             16 pgs are stuck inactive for more than 300 seconds
>>>>>>             16 pgs stale
>>>>>>             16 pgs stuck stale
>>>>>>      monmap e11: 3 mons at {10.105.176.85=
>>>>>> 10.105.176.85:6789/0,10.110.248.154=10.110.248.154:6789/0,10.110.249.153=10.110.249.153:6789/0
>>>>>> }
>>>>>>             election epoch 84, quorum 0,1,2
>>>>>> 10.105.176.85,10.110.248.154,10.110.249.153
>>>>>>      osdmap e248: 6 osds: 4 up, 4 in; 16 remapped pgs
>>>>>>             flags sortbitwise,require_jewel_osds
>>>>>>       pgmap v112095: 128 pgs, 1 pools, 14659 kB data, 17 objects
>>>>>>             165 MB used, 159 GB / 160 GB avail
>>>>>>                  112 active+clean
>>>>>>                   16 stale+active+clean
>>>>>>
>>>>>> # ceph health detail
>>>>>> HEALTH_ERR 16 pgs are stuck inactive for more than 300 seconds; 16
>>>>>> pgs stale; 16 pgs stuck stale
>>>>>> pg 2.67 is stuck stale for 689.171742, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.5a is stuck stale for 689.171748, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>> pg 2.52 is stuck stale for 689.171753, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.4d is stuck stale for 689.171757, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.56 is stuck stale for 689.171755, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>> pg 2.d is stuck stale for 689.171811, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>> pg 2.79 is stuck stale for 689.171808, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.1f is stuck stale for 689.171782, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>> pg 2.76 is stuck stale for 689.171809, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>> pg 2.17 is stuck stale for 689.171794, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>> pg 2.63 is stuck stale for 689.171794, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.77 is stuck stale for 689.171816, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.1b is stuck stale for 689.171793, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>> pg 2.62 is stuck stale for 689.171765, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.30 is stuck stale for 689.171799, current state
>>>>>> stale+active+clean, last acting [2,6]
>>>>>> pg 2.19 is stuck stale for 689.171798, current state
>>>>>> stale+active+clean, last acting [6,2]
>>>>>>
>>>>>>  # ceph pg dump_stuck stale
>>>>>> ok
>>>>>> pg_stat state   up      up_primary      acting  acting_primary
>>>>>> 2.67    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.5a    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>> 2.52    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.4d    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.56    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>> 2.d     stale+active+clean      [6,2]   6       [6,2]   6
>>>>>> 2.79    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.1f    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>> 2.76    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>> 2.17    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>> 2.63    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.77    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.1b    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>> 2.62    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.30    stale+active+clean      [2,6]   2       [2,6]   2
>>>>>> 2.19    stale+active+clean      [6,2]   6       [6,2]   6
>>>>>>
>>>>>> # ceph pg 2.62 query
>>>>>> Error ENOENT: i don't have pgid 2.62
>>>>>>
>>>>>>  # rados ls -p exp-volumes
>>>>>> rbd_data.4a41f238e1f29.000000000000003f
>>>>>> ^C --> hang
>>>>>>
>>>>>> I understand that this is a natural result becasue above pgs have no
>>>>>> primary and seconary osd. But this situation can be occurred so, I want 
>>>>>> to
>>>>>> recover ceph cluster and rbd images.
>>>>>>
>>>>>> Firstly I want to know how to make ceph cluster's state clean.
>>>>>> I read document and try to solve this but nothing can help including
>>>>>> below commands.
>>>>>>  - ceph pg force_create_pg 2.6
>>>>>>  - ceph osd lost 2 --yes-i-really-mean-it
>>>>>>  - ceph osd lost 6 --yes-i-really-mean-it
>>>>>>  - ceph osd crush rm osd.2
>>>>>>  - ceph osd crush rm osd.6
>>>>>>  - cpeh osd rm osd.2
>>>>>>  - ceph osd rm osd.6
>>>>>>
>>>>>> Is there any command to force delete pgs or make ceph cluster clean ?
>>>>>> Thank you in advance.
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph pgs state forever stale+active+clean

Reply via email to