Mine repaired themselves after a regular deep scrub. Weird that I couldn't trigger one manually.
On 30 April 2018 at 14:23, David Turner <[email protected]> wrote: > My 3 inconsistent PGs finally decided to run automatic scrubs and now 2 of > the 3 will allow me to run deep-scrubs and repairs on them. The deep-scrub > did not show any new information about the objects other than that they > were missing in one of the copies. Running a repair fixed the > inconsistency. > > On Tue, Apr 24, 2018 at 4:53 PM David Turner <[email protected]> > wrote: > >> Neither the issue I created nor Michael's [1] ticket that it was rolled >> into are getting any traction. How are y'all fairing with your clusters? >> I've had 3 PGs inconsistent with 5 scrub errors for a few weeks now. I >> assumed that the third PG was just like the first 2 in that it couldn't be >> scrubbed, but I just checked the last scrub timestamp of the 3 PGs and the >> third one is able to run scrubs. I'm going to increase the logging on it >> after I finish a round of maintenance we're performing on some OSDs. >> Hopefully I'll find something more about these objects. >> >> >> [1] http://tracker.ceph.com/issues/23576 >> >> On Fri, Apr 6, 2018 at 12:30 PM David Turner <[email protected]> >> wrote: >> >>> I'm using filestore. I think the root cause is something getting stuck >>> in the code. As such I went ahead and created a [1] bug tracker for this. >>> Hopefully it gets some traction as I'm not particularly looking forward to >>> messing with deleting PGs with the ceph-objectstore-tool in production. >>> >>> [1] http://tracker.ceph.com/issues/23577 >>> >>> On Fri, Apr 6, 2018 at 11:40 AM Michael Sudnick < >>> [email protected]> wrote: >>> >>>> I've tried a few more things to get a deep-scrub going on my PG. I >>>> tried instructing the involved osds to scrub all their PGs and it looks >>>> like that didn't do it. >>>> >>>> Do you have any documentation on the object-store-tool? What I've found >>>> online talks about filestore and not bluestore. >>>> >>>> On 6 April 2018 at 09:27, David Turner <[email protected]> wrote: >>>> >>>>> I'm running into this exact same situation. I'm running 12.2.2 and I >>>>> have an EC PG with a scrub error. It has the same output for [1] rados >>>>> list-inconsistent-obj as mentioned before. This is the [2] full health >>>>> detail. This is the [3] excerpt from the log from the deep-scrub that >>>>> marked the PG inconsistent. The scrub happened when the PG was starting >>>>> up >>>>> after using ceph-objectstore-tool to split its filestore subfolders. This >>>>> is using a script that I've used for months without any side effects. >>>>> >>>>> I have tried quite a few things to get this PG to deep-scrub or >>>>> repair, but to no avail. It will not do anything. I have set every osd's >>>>> osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep >>>>> scrubbing to finish, then increased the 11 OSDs for this PG to 1 before >>>>> issuing a deep-scrub. And it will sit there for over an hour without >>>>> deep-scrubbing. My current testing of this is to set all osds to 1, >>>>> increase all of the osds for this PG to 4, and then issue the repair... >>>>> but >>>>> similarly nothing happens. Each time I issue the deep-scrub or repair, >>>>> the >>>>> output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but >>>>> nothing shows up in the log for the OSD and the PG state stays >>>>> 'active+clean+inconsistent'. >>>>> >>>>> My next step, unless anyone has a better idea, is to find the exact >>>>> copy of the PG with the missing object, use object-store-tool to back up >>>>> that copy of the PG and remove it. Then starting the OSD back up should >>>>> backfill the full copy of the PG and be healthy again. >>>>> >>>>> >>>>> >>>>> [1] $ rados list-inconsistent-obj 145.2e3 >>>>> No scrub information available for pg 145.2e3 >>>>> error 2: (2) No such file or directory >>>>> >>>>> [2] $ ceph health detail >>>>> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent >>>>> OSD_SCRUB_ERRORS 1 scrub errors >>>>> PG_DAMAGED Possible data damage: 1 pg inconsistent >>>>> pg 145.2e3 is active+clean+inconsistent, acting >>>>> [234,132,33,331,278,217,55,358,79,3,24] >>>>> >>>>> [3] 2018-04-04 15:24:53.603380 7f54d1820700 0 log_channel(cluster) >>>>> log [DBG] : 145.2e3 deep-scrub starts >>>>> 2018-04-04 17:32:37.916853 7f54d1820700 -1 log_channel(cluster) log >>>>> [ERR] : 145.2e3s0 deep-scrub 1 missing, 0 inconsistent objects >>>>> 2018-04-04 17:32:37.916865 7f54d1820700 -1 log_channel(cluster) log >>>>> [ERR] : 145.2e3 deep-scrub 1 errors >>>>> >>>>> On Mon, Apr 2, 2018 at 4:51 PM Michael Sudnick < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Kjetil, >>>>>> >>>>>> I've tried to get the pg scrubbing/deep scrubbing and nothing seems >>>>>> to be happening. I've tried it a few times over the last few days. My >>>>>> cluster is recovering from a failed disk (which was probably the reason >>>>>> for >>>>>> the inconsistency), do I need to wait for the cluster to heal before >>>>>> repair/deep scrub works? >>>>>> >>>>>> -Michael >>>>>> >>>>>> On 2 April 2018 at 14:13, Kjetil Joergensen <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> scrub or deep-scrub the pg, that should in theory get you back to >>>>>>> list-inconsistent-obj spitting out what's wrong, then mail that info to >>>>>>> the >>>>>>> list. >>>>>>> >>>>>>> -KJ >>>>>>> >>>>>>> On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I have a small cluster with an inconsistent pg. I've tried ceph pg >>>>>>>> repair multiple times to no luck. rados list-inconsistent-obj 49.11c >>>>>>>> returns: >>>>>>>> >>>>>>>> # rados list-inconsistent-obj 49.11c >>>>>>>> No scrub information available for pg 49.11c >>>>>>>> error 2: (2) No such file or directory >>>>>>>> >>>>>>>> I'm a bit at a loss here as what to do to recover. That pg is part >>>>>>>> of a cephfs_data pool with compression set to force/snappy. >>>>>>>> >>>>>>>> Does anyone have an suggestions? >>>>>>>> >>>>>>>> -Michael >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Kjetil Joergensen <[email protected]> >>>>>>> SRE, Medallia Inc >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> [email protected] >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> >>>>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
