There is a [1] tracker open for this issue. There are 2 steps that should get a pg to scrub/repair that is just issuing the scrub, but not running it. First is to increase osd_max_scrubs on the OSDs involved in the PG. If that doesn't fix it, then try increasing your osd_deep_scrub_interval on all osds in your cluster. Both settings can be injected and in my experience that should allow your PG to repair/deep-scrub.
The idea is that your cluster isn't able to keep up with the deep-scrub schedule and the deep-scrubs being forced to run by the cluster due to the interval are higher priority than the ones you manually submit. That was definitely the case when I had this problem a few weeks ago and these steps resolved it. When I had it a few months ago I just let it run its course and the repair finally happened about 3 weeks after I issued it. My osd_deep_scrub_interval was set to 30 days, but apparently it was taking closer to 7 weeks to get through all of the PGs. [1] https://tracker.ceph.com/issues/23576#change-119460 On Tue, Aug 28, 2018, 5:16 AM Maks Kowalik <[email protected]> wrote: > Scrubs discovered the following inconsistency: > > 2018-08-23 17:21:07.933458 osd.62 osd.62 10.122.0.140:6805/77767 6 : > cluster [ERR] 9.3cd shard 113: soid > 9:b3cd8d89:::.dir.default.153398310.112:head omap_digest 0xea4ba012 != > omap_digest 0xc5acebfd from shard 62, omap_digest 0xea4ba012 != omap_digest > 0xc5acebfd from auth oi > 9:b3cd8d89:::.dir.default.153398310.112:head(138609'2009129 > osd.250.0:64658209 dirty|omap|data_digest|omap_digest s 0 uv 1995230 dd > ffffffff od c5acebfd alloc_hint [0 0 0]) > > The omap_digest_mismatch appears on a non-primary OSD in a pool with 4 > replicas. In this situation I decided to issue "pg repair" as I expected > ceph will repair the broken object. The command was successful but repair > on 9.3cd didn't start. > > Then I have tried the procedure described here (setting a temporary key on > the object to force recalculation of omap_digest): > https://www.mail-archive.com/[email protected]/msg47219.html > But deep-scrub on 9.3cd didn't start. The OSD marked the 9.3cd for > scrubbing, but that's all what happened: > > 2018-08-27 14:36:22.703848 7faa7e860700 20 osd.62 713813 OSD::ms_dispatch: > scrub([9.3cd] deep) v2 > 2018-08-27 14:36:22.703869 7faa7e860700 20 osd.62 713813 _dispatch > 0x55725b76d180 scrub([9.3cd] deep) v2 > 2018-08-27 14:36:22.703871 7faa7e860700 10 osd.62 713813 handle_scrub > scrub([9.3cd] deep) v2 > 2018-08-27 14:36:22.703878 7faa7e860700 10 osd.62 713813 marking pg[9.3cd( > v 713813'2359292 (713107'2357731,713813'2359292] > local-lis/les=711049/711050 n=41419 ec=178/178 lis/c 711049/711049 les/c/f > 711050/711149/222921 711049/711049/710352) [62,53,163,113] r=0 lpr=711049 > crt=713813'2359292 lcod 713813'2359291 mlcod 713813'2359291 > active+clean+inconsistent MUST_DEEP_SCRUB MUST_SCRUB] for scrub > > Does anyone know how to recover from inconsistency in such case? > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
