Yeah improving that workflow is in the backlog. (or maybe it's done in
master? I forget.) But it's complicated, so for now that's just how it
goes. :(

On Thu, Oct 11, 2018 at 10:27 AM Brett Chancellor <
[email protected]> wrote:

> This seems like a bug. If I'm kicking off a repair manually it should take
> place immediately, and ignore flags such as max scrubs, or minimum scrub
> window.
>
> -Brett
>
> On Thu, Oct 11, 2018 at 1:11 PM David Turner <[email protected]>
> wrote:
>
>> As a part of a repair is queuing a deep scrub. As soon as the repair part
>> is over the deep scrub continues until it is done.
>>
>> On Thu, Oct 11, 2018, 12:26 PM Brett Chancellor <
>> [email protected]> wrote:
>>
>>> Does the "repair" function use the same rules as a deep scrub? I
>>> couldn't get one to kick off, until I temporarily increased the max_scrubs
>>> and lowered the scrub_min_interval on all 3 OSDs for that placement group.
>>> This ended up fixing the issue, so I'll leave this here in case somebody
>>> else runs into it.
>>>
>>> sudo ceph tell 'osd.208' injectargs '--osd_max_scrubs 3'
>>> sudo ceph tell 'osd.120' injectargs '--osd_max_scrubs 3'
>>> sudo ceph tell 'osd.235' injectargs '--osd_max_scrubs 3'
>>> sudo ceph tell 'osd.208' injectargs '--osd_scrub_min_interval 1.0'
>>> sudo ceph tell 'osd.120' injectargs '--osd_scrub_min_interval 1.0'
>>> sudo ceph tell 'osd.235' injectargs '--osd_scrub_min_interval 1.0'
>>> sudo ceph pg repair 75.302
>>>
>>> -Brett
>>>
>>>
>>> On Thu, Oct 11, 2018 at 8:42 AM Maks Kowalik <[email protected]>
>>> wrote:
>>>
>>>> Imho moving was not the best idea (a copying attempt would have told if
>>>> the read error was the case here).
>>>> Scrubs might don't want to start if there are many other scrubs ongoing.
>>>>
>>>> czw., 11 paź 2018 o 14:27 Brett Chancellor <[email protected]>
>>>> napisał(a):
>>>>
>>>>> I moved the file. But the cluster won't actually start any
>>>>> scrub/repair I manually initiate.
>>>>>
>>>>> On Thu, Oct 11, 2018, 7:51 AM Maks Kowalik <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Based on the log output it looks like you're having a damaged file on
>>>>>> OSD 235 where the shard is stored.
>>>>>> To ensure if that's the case you should find the file (using
>>>>>> 81d5654895863d as a part of its name) and try to copy it to another
>>>>>> directory.
>>>>>> If you get the I/O error while copying, the next steps would be to
>>>>>> delete the file, run the scrub on 75.302 and take a deep look at the
>>>>>> OSD.235 for any other errors.
>>>>>>
>>>>>> Kind regards,
>>>>>> Maks
>>>>>>
>>>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to