Re: [ceph-users] Fixing a HEALTH_ERR situation

Jorge Garcia Sat, 18 May 2019 15:42:04 -0700

I have tried ceph pg repair several times. It claims "instructing pg
2.798s0 on osd.41 to repair" but then nothing happens as far as I can tell.
Any way of knowing if it's doing more?


On Sat, May 18, 2019 at 3:33 PM Brett Chancellor <[email protected]>
wrote:

> I would try the ceph pg repair. If you see the pg go into deep scrubbing,
> then back to inconsistent you probably have a bad drive. Find which of the
> drives in the pg are bad (pg query or go to the host and look through
> dmesg). Take that osd offline and mark it out. Once backfill is complete,
> it should clear up.
>
> On Sat, May 18, 2019, 6:05 PM Jorge Garcia <[email protected]> wrote:
>
>> We are testing a ceph cluster mostly using cephfs. We are using an
>> erasure-code pool, and have been loading it up with data. Recently, we got
>> a HEALTH_ERR response when we were querying the ceph status. We stopped all
>> activity to the filesystem, and waited to see if the error would go away.
>> It didn't. Then we tried a couple of suggestions from the internet (ceph pg
>> repair, ceph pg scrub, ceph pg deep-scrub) to no avail. I'm not sure how to
>> find out more information about what the problem is, and how to repair the
>> filesystem to bring it back to normal health. Any suggestions?
>>
>> Current status:
>>
>> # ceph -s
>>
>>   cluster:
>>
>>     id:     28ef32f1-4350-491b-9003-b19b9c3a2076
>>
>>     health: HEALTH_ERR
>>
>>             5 scrub errors
>>
>>             Possible data damage: 1 pg inconsistent
>>
>>
>>
>>   services:
>>
>>     mon: 3 daemons, quorum gi-cba-01,gi-cba-02,gi-cba-03
>>
>>     mgr: gi-cba-01(active), standbys: gi-cba-02, gi-cba-03
>>
>>     mds: backups-1/1/1 up  {0=gi-cbmd=up:active}
>>
>>     osd: 87 osds: 87 up, 87 in
>>
>>
>>
>>   data:
>>
>>     pools:   2 pools, 4096 pgs
>>
>>     objects: 90.98 M objects, 134 TiB
>>
>>     usage:   210 TiB used, 845 TiB / 1.0 PiB avail
>>
>>     pgs:     4088 active+clean
>>
>>              5    active+clean+scrubbing+deep
>>
>>              2    active+clean+scrubbing
>>
>>              1    active+clean+inconsistent
>>
>> # ceph health detail
>>
>> HEALTH_ERR 5 scrub errors; Possible data damage: 1 pg inconsistent
>>
>> OSD_SCRUB_ERRORS 5 scrub errors
>>
>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>>
>>     pg 2.798 is active+clean+inconsistent, acting [41,50,17,2,86,70,61]
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fixing a HEALTH_ERR situation

Reply via email to