Hi, Recently I lost 5 out of 12 journal OSDs (2xSDD failure at one time). size=2, min_size=1. I know, should rather be 3/2, I have plans to switch to it asap.
CEPH started to throw many failures, then I removed these two SSDs, and recreated journal OSD from scratch. In my case, all data on main OSD was still there, but Ceph tried to do the best it could to disable write to OSDs and keep the data consistency. After re-creating all 5 journal OSD on another HDD, recovery+backfill started to work. After couple of hours it discovered 7 "unfound" objects (6 in data OSD and 1 hitset in cache tier). I found out what files were affected, and hoped to not loose important data. Then after trying to revert these 6 unfound object to the previous version, but if was unsuccessfull, so I just deleted them. Most important problem we found was that single hitset file that we couldn't just delete, and instead we took some another hitset file and copied it onto missing one. Then cache tier recognized this hitset and invalidated it, which allowed all the backfill+recovery to finish, and finally entire Ceph cluster went back to HEALTH_OK. Finally I run fsck wherever these 6 unfound files could affect, and fortunately, these lost blocks were not important and contained empty data, so fsck recovery was successfull in all cases. That was very stressfull time :) -- Wojtek wt., 13.12.2016 o 00:01 użytkownik Christian Balzer <[email protected]> napisał: > On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote: > > > Hi, > > > > just in case: What happens when all replica journal SSDs are broken at > once? > > > That would be bad, as in BAD. > > In theory you just "lost" all the associated OSDs and their data. > > In practice everything but in the in-flight data at the time is still on > the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as > Ceph is concerned. > > So with some trickery and an experienced data-recovery Ceph consultant you > _may_ get things running with limited data loss/corruption, but that's > speculation and may be wishful thinking on my part. > > Another data point to deploy only well known/monitored/trusted SSDs and > have a 3x replication. > > > The PGs most likely will be stuck inactive but as I read, the journals > just > > need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd- > > journal-failure/). > > > > Does this also work in this case? > > > Not really, no. > > The above works by having still a valid state and operational OSDs from > which the "broken" one can recover. > > Christian > -- > Christian Balzer Network/Systems Engineer > [email protected] Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
