Hi, Am 03.06.2014 21:46, schrieb Jason Harley: > Howdy — > > I’ve had a failure on a small, Dumpling (0.67.4) cluster running on Ubuntu > 13.10 machines. I had three OSD nodes (running 6 OSDs each), and lost two of > them in a beautiful failure. One of these nodes even went so far as to > scramble the XFS filesystems of my OSD disks (I’m curious if it has some bad > DIMMs). > > Anyway, the thing is: I’m okay with losing the data, this was a test setup > and I want to take this opportunity to learn from the recovery process. I’m > now stuck in ‘HEALTH_ERR’ and want to get back to ‘HEALTH_OK’ without just > reinitializing the cluster. > > My OSD map seems correct, I’ve done scrubs (deep, and normal) at the PG and > OSD levels. ‘ceph -s’ shows that I have 47 unfound objects still after I > told ceph to ‘mark_unfound_lost’. The remaining 47 PGs tell me that they > "haven't probed all sources, not marking lost”. Two days have passed at this > point, and I’d just like to get my cluster back to working and deal with the > object loss (which seems located to a single pool). > > How do I move forward from here, if at all? Do I ‘force_create_pg’ the PGs > containing my unfound objects? > >> # ceph health detail | grep "unfound" | grep "^pg" >> pg 4.ffe is active+recovering, acting [7,26], 3 unfound ... >> pg 4.43 is active+recovering, acting [9,23], 1 unfound > >
what is the output of: ceph pg query 4.ffe -- Mit freundlichen Grüßen, Florian Wiessner Smart Weblications GmbH Martinsberger Str. 1 D-95119 Naila fon.: +49 9282 9638 200 fax.: +49 9282 9638 205 24/7: +49 900 144 000 00 - 0,99 EUR/Min* http://www.smart-weblications.de -- Sitz der Gesellschaft: Naila Geschäftsführer: Florian Wiessner HRB-Nr.: HRB 3840 Amtsgericht Hof *aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
