Re: [ceph-users] Unrepairable PG

David Turner Tue, 28 Aug 2018 07:49:58 -0700

There is a [1] tracker open for this issue. There are 2 steps that should
get a pg to scrub/repair that is just issuing the scrub, but not running
it. First is to increase osd_max_scrubs on the OSDs involved in the PG. If
that doesn't fix it, then try increasing your osd_deep_scrub_interval on
all osds in your cluster. Both settings can be injected and in my
experience that should allow your PG to repair/deep-scrub.


The idea is that your cluster isn't able to keep up with the deep-scrub
schedule and the deep-scrubs being forced to run by the cluster due to the
interval are higher priority than the ones you manually submit. That was
definitely the case when I had this problem a few weeks ago and these steps
resolved it. When I had it a few months ago I just let it run its course
and the repair finally happened about 3 weeks after I issued it. My
osd_deep_scrub_interval was set to 30 days, but apparently it was taking
closer to 7 weeks to get through all of the PGs.


[1] https://tracker.ceph.com/issues/23576#change-119460

On Tue, Aug 28, 2018, 5:16 AM Maks Kowalik <[email protected]> wrote:

> Scrubs discovered the following inconsistency:
>
> 2018-08-23 17:21:07.933458 osd.62 osd.62 10.122.0.140:6805/77767 6 :
> cluster [ERR] 9.3cd shard 113: soid
> 9:b3cd8d89:::.dir.default.153398310.112:head omap_digest 0xea4ba012 !=
> omap_digest 0xc5acebfd from shard 62, omap_digest 0xea4ba012 != omap_digest
> 0xc5acebfd from auth oi
> 9:b3cd8d89:::.dir.default.153398310.112:head(138609'2009129
> osd.250.0:64658209 dirty|omap|data_digest|omap_digest s 0 uv 1995230 dd
> ffffffff od c5acebfd alloc_hint [0 0 0])
>
> The omap_digest_mismatch appears on a non-primary OSD in a pool with 4
> replicas. In this situation I decided to issue "pg repair" as I expected
> ceph will repair the broken object. The command was successful but repair
> on 9.3cd didn't start.
>
> Then I have tried the procedure described here (setting a temporary key on
> the object to force recalculation of omap_digest):
> https://www.mail-archive.com/[email protected]/msg47219.html
> But deep-scrub on 9.3cd didn't start. The OSD marked the 9.3cd for
> scrubbing, but that's all what happened:
>
> 2018-08-27 14:36:22.703848 7faa7e860700 20 osd.62 713813 OSD::ms_dispatch:
> scrub([9.3cd] deep) v2
> 2018-08-27 14:36:22.703869 7faa7e860700 20 osd.62 713813 _dispatch
> 0x55725b76d180 scrub([9.3cd] deep) v2
> 2018-08-27 14:36:22.703871 7faa7e860700 10 osd.62 713813 handle_scrub
> scrub([9.3cd] deep) v2
> 2018-08-27 14:36:22.703878 7faa7e860700 10 osd.62 713813 marking pg[9.3cd(
> v 713813'2359292 (713107'2357731,713813'2359292]
> local-lis/les=711049/711050 n=41419 ec=178/178 lis/c 711049/711049 les/c/f
> 711050/711149/222921 711049/711049/710352) [62,53,163,113] r=0 lpr=711049
> crt=713813'2359292 lcod 713813'2359291 mlcod 713813'2359291
> active+clean+inconsistent MUST_DEEP_SCRUB MUST_SCRUB] for scrub
>
> Does anyone know how to recover from inconsistency in such case?
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unrepairable PG

Reply via email to