Re: [ceph-users] Deleting incomplete PGs from an erasure coded pool

Maks Kowalik Tue, 28 Aug 2018 07:07:18 -0700

I don't want to "rescue" any OSDs. I want to clean the incomplete PGs to
make CEPH proceed with PG re-creation and making those groups active again.
In my case which OSDs should I start with the
"osd_find_best_info_ignore_history_les" option?
This is the part of query output from one of the groups to be cleared:
"probing_osds": [ "54(1)", "81(2)", "103(0)", "103(1)", "118(9)", "126(3)",
"129(4)", "141(1)", "142(2)", "147(7)", "150(1)", "153(8)",
"159(0)","165(6)", "168(5)",
"171(0)","174(3)","177(9)","180(5)","262(2)","291(5)","313(1)","314(8)","315(7)","316(0)","318(6)"],
"down_osds_we_would_probe": [4,88,91,94,112,133]


Maks

wt., 28 sie 2018 o 15:20 Paul Emmerich <[email protected]> napisał(a):

> I don't think it's documented.
>
> It won't affect PGs that are active+clean.
> Takes effect during peering, easiest to set it in ceph.conf and
> restart the OSDs on *all* OSDs that you want to rescue.
> Important to not forget to unset it afterwards
>
>
> Paul
>
> 2018-08-28 13:21 GMT+02:00 Maks Kowalik <[email protected]>:
> > Thank you for answering.
> > Where is this option documented?
> > Do I set it in the config file, or using "tell osd.number" or
> admin-daemon?
> > Do I set it on the primary OSD of the up set, on all OSDs of the up set,
> or
> > maybe on all historical peers holding the shards of a particular group?
> > Is this option dangerous to other groups on those OSDs (currently an OSD
> > holds about 160 pgs)?
> >
> > Maks
> >
> > wt., 28 sie 2018 o 12:12 Paul Emmerich <[email protected]>
> napisał(a):
> >>
> >> No need to delete it, that situation should be mostly salvagable by
> >> setting osd_find_best_info_ignore_history_les temporarily on the
> >> affected OSDs.
> >> That should cause you to "just" lose some writes resulting in
> inconsistent
> >> data.
> >>
> >>
> >> Paul
> >>
> >> 2018-08-28 11:08 GMT+02:00 Maks Kowalik <[email protected]>:
> >> > What is the correct procedure for re-creating an incomplete placement
> >> > group
> >> > that belongs to an erasure coded pool?
> >> > I'm facing a situation when too many shards of 3 PGs were lost during
> >> > OSD
> >> > crashes, and taking the data loss was decided, but can't force ceph to
> >> > recreate those PGs. The query output shows:
> >> > "peering_blocked_by_detail": [
> >> >                 {"detail": "peering_blocked_by_history_les_bound"}
> >> > What was tried:
> >> > 1. manual deletion of all shards appearing in "peers" secion of PG
> query
> >> > output
> >> > 2. marking all shards as complete using ceph-objectstore-tool
> >> > 3. deleting peering history from OSDs keeping the shards
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > [email protected]
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >>
> >>
> >> --
> >> Paul Emmerich
> >>
> >> Looking for help with your Ceph cluster? Contact us at https://croit.io
> >>
> >> croit GmbH
> >> Freseniusstr. 31h
> >> 81247 München
> >> www.croit.io
> >> Tel: +49 89 1896585 90
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Deleting incomplete PGs from an erasure coded pool

Reply via email to