Hi,

For those facing (lots of) active+clean+inconsistent PGs after the luminous 
12.2.6 metadata corruption and 12.2.7 upgrade, I'd like to explain how I 
finally got rid of those.

Disclaimer : my cluster doesn't contain highly valuable data, and I can sort of 
recreate what is actually contains : VMs. The following is risky...

One reason I needed to fix those issues is that I faced IO errors whit pool 
overlays/tiering which were apparently related to the inconsistencies, and the 
only way I could get my VMs running again was to completely disable the SSDs 
overlay, which is far from  ideal.
For those not feeling the need to fix this "harmless" issue, please stop 
reading.
For the others, please understand the risks of the following... or wait for an 
official "pg repair" solution

So :

1st step :
since I was getting an ever growing list of damaged PGs, I decided to 
deep-scrub... all PGs.
Yes. If you have 1+PB data... stop reading (or not ?).

How to do that :
# for j in <pools to scrub> ; do for i in `ceph pg ls-by-pool $j |cut -d " " -f 
1|tail -n +2`; do ceph pg deep-scrub $i ; done ; done

I think I already had a full list of damaged PGs until I upgraded to mimic and 
restarted the MONs/the OSDs : I believe daemons restarts caused ceph to forget 
about known inconsistencies.
If you believe the number of damaged PGs is sort of stable for you then skip 
step 1...

2nd step is sort of easy : it is to apply the method described here :

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021054.html

I tried to add some rados locking before overwriting the objects (4M rbd 
objects in my case), but was still able to overwrite a locked object even with 
"rados -p rbd lock get --lock-type exclusive" ... maybe I haven't tried hard 
enough.
It would have been great if it were possible to make sure the object was not 
overwritten between a get and a put :/ - that would make this procedure much 
safer...

In my case, I had 2000+ damaged PGs, so I wrote a small script that should 
process those PGs and should try to apply the procedure:
https://gist.github.com/fschaer/cb851eae4f46287eaf30715e18f14524

My Ceph cluster has been healthy since Friday evening and I haven't seen any 
data corruption nor any hung VM...

Cheers
Frederic
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to