I have been following this thread for a while, and thought I need to have "major ceph disaster" alert on the monitoring ;) http://www.f1-outsourcing.eu/files/ceph-disaster.mp4
-----Original Message----- From: Kevin Flöh [mailto:[email protected]] Sent: donderdag 23 mei 2019 10:51 To: Robert LeBlanc Cc: [email protected] Subject: Re: [ceph-users] Major ceph disaster Hi, we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery wait to start the deep-scrub and get the output? I guess the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going? Kevin On 22.05.19 6:03 nachm., Robert LeBlanc wrote: On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <[email protected]> wrote: Hi, thank you, it worked. The PGs are not incomplete anymore. Still we have another problem, there are 7 PGs inconsistent and a cpeh pg repair is not doing anything. I just get "instructing pg 1.5dd on osd.24 to repair" and nothing happens. Does somebody know how we can get the PGs to repair? Regards, Kevin Kevin, I just fixed an inconsistent PG yesterday. You will need to figure out why they are inconsistent. Do these steps and then we can figure out how to proceed. 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them) 2. Print out the inconsistent report for each inconsistent PG. `rados list-inconsistent-obj <PG_NUM> --format=json-pretty` 3. You will want to look at the error messages and see if all the shards have the same data. Robert LeBlanc _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
