Yeah improving that workflow is in the backlog. (or maybe it's done in master? I forget.) But it's complicated, so for now that's just how it goes. :(
On Thu, Oct 11, 2018 at 10:27 AM Brett Chancellor < [email protected]> wrote: > This seems like a bug. If I'm kicking off a repair manually it should take > place immediately, and ignore flags such as max scrubs, or minimum scrub > window. > > -Brett > > On Thu, Oct 11, 2018 at 1:11 PM David Turner <[email protected]> > wrote: > >> As a part of a repair is queuing a deep scrub. As soon as the repair part >> is over the deep scrub continues until it is done. >> >> On Thu, Oct 11, 2018, 12:26 PM Brett Chancellor < >> [email protected]> wrote: >> >>> Does the "repair" function use the same rules as a deep scrub? I >>> couldn't get one to kick off, until I temporarily increased the max_scrubs >>> and lowered the scrub_min_interval on all 3 OSDs for that placement group. >>> This ended up fixing the issue, so I'll leave this here in case somebody >>> else runs into it. >>> >>> sudo ceph tell 'osd.208' injectargs '--osd_max_scrubs 3' >>> sudo ceph tell 'osd.120' injectargs '--osd_max_scrubs 3' >>> sudo ceph tell 'osd.235' injectargs '--osd_max_scrubs 3' >>> sudo ceph tell 'osd.208' injectargs '--osd_scrub_min_interval 1.0' >>> sudo ceph tell 'osd.120' injectargs '--osd_scrub_min_interval 1.0' >>> sudo ceph tell 'osd.235' injectargs '--osd_scrub_min_interval 1.0' >>> sudo ceph pg repair 75.302 >>> >>> -Brett >>> >>> >>> On Thu, Oct 11, 2018 at 8:42 AM Maks Kowalik <[email protected]> >>> wrote: >>> >>>> Imho moving was not the best idea (a copying attempt would have told if >>>> the read error was the case here). >>>> Scrubs might don't want to start if there are many other scrubs ongoing. >>>> >>>> czw., 11 paź 2018 o 14:27 Brett Chancellor <[email protected]> >>>> napisał(a): >>>> >>>>> I moved the file. But the cluster won't actually start any >>>>> scrub/repair I manually initiate. >>>>> >>>>> On Thu, Oct 11, 2018, 7:51 AM Maks Kowalik <[email protected]> >>>>> wrote: >>>>> >>>>>> Based on the log output it looks like you're having a damaged file on >>>>>> OSD 235 where the shard is stored. >>>>>> To ensure if that's the case you should find the file (using >>>>>> 81d5654895863d as a part of its name) and try to copy it to another >>>>>> directory. >>>>>> If you get the I/O error while copying, the next steps would be to >>>>>> delete the file, run the scrub on 75.302 and take a deep look at the >>>>>> OSD.235 for any other errors. >>>>>> >>>>>> Kind regards, >>>>>> Maks >>>>>> >>>>> _______________________________________________ >>> ceph-users mailing list >>> [email protected] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
