Re: [ceph-users] Have an inconsistent PG, repair not working

David Turner Fri, 06 Apr 2018 06:28:14 -0700

I'm running into this exact same situation.  I'm running 12.2.2 and I have
an EC PG with a scrub error.  It has the same output for [1] rados
list-inconsistent-obj as mentioned before.  This is the [2] full health
detail.  This is the [3] excerpt from the log from the deep-scrub that
marked the PG inconsistent.  The scrub happened when the PG was starting up
after using ceph-objectstore-tool to split its filestore subfolders.  This
is using a script that I've used for months without any side effects.

I have tried quite a few things to get this PG to deep-scrub or repair, but
to no avail.  It will not do anything.  I have set every osd's
osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep
scrubbing to finish, then increased the 11 OSDs for this PG to 1 before
issuing a deep-scrub.  And it will sit there for over an hour without
deep-scrubbing.  My current testing of this is to set all osds to 1,
increase all of the osds for this PG to 4, and then issue the repair... but
similarly nothing happens.  Each time I issue the deep-scrub or repair, the
output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but
nothing shows up in the log for the OSD and the PG state stays
'active+clean+inconsistent'.

My next step, unless anyone has a better idea, is to find the exact copy of
the PG with the missing object, use object-store-tool to back up that copy
of the PG and remove it.  Then starting the OSD back up should backfill the
full copy of the PG and be healthy again.

[1] $ rados list-inconsistent-obj 145.2e3
No scrub information available for pg 145.2e3
error 2: (2) No such file or directory

[2] $ ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 145.2e3 is active+clean+inconsistent, acting
[234,132,33,331,278,217,55,358,79,3,24]

[3] 2018-04-04 15:24:53.603380 7f54d1820700  0 log_channel(cluster) log
[DBG] : 145.2e3 deep-scrub starts
2018-04-04 17:32:37.916853 7f54d1820700 -1 log_channel(cluster) log [ERR] :
145.2e3s0 deep-scrub 1 missing, 0 inconsistent objects
2018-04-04 17:32:37.916865 7f54d1820700 -1 log_channel(cluster) log [ERR] :
145.2e3 deep-scrub 1 errors

On Mon, Apr 2, 2018 at 4:51 PM Michael Sudnick <[email protected]>
wrote:

> Hi Kjetil,
>
> I've tried to get the pg scrubbing/deep scrubbing and nothing seems to be
> happening. I've tried it a few times over the last few days. My cluster is
> recovering from a failed disk (which was probably the reason for the
> inconsistency), do I need to wait for the cluster to heal before
> repair/deep scrub works?
>
> -Michael
>
> On 2 April 2018 at 14:13, Kjetil Joergensen <[email protected]> wrote:
>
>> Hi,
>>
>> scrub or deep-scrub the pg, that should in theory get you back to
>> list-inconsistent-obj spitting out what's wrong, then mail that info to the
>> list.
>>
>> -KJ
>>
>> On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick <
>> [email protected]> wrote:
>>
>>> Hello,
>>>
>>> I have a small cluster with an inconsistent pg. I've tried ceph pg
>>> repair multiple times to no luck. rados list-inconsistent-obj 49.11c
>>> returns:
>>>
>>> # rados list-inconsistent-obj 49.11c
>>> No scrub information available for pg 49.11c
>>> error 2: (2) No such file or directory
>>>
>>> I'm a bit at a loss here as what to do to recover. That pg is part of a
>>> cephfs_data pool with compression set to force/snappy.
>>>
>>> Does anyone have an suggestions?
>>>
>>> -Michael
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> Kjetil Joergensen <[email protected]>
>> SRE, Medallia Inc
>>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Have an inconsistent PG, repair not working

Reply via email to