Hi,

Recently I lost 5 out of 12 journal OSDs (2xSDD failure at one time).
size=2, min_size=1. I know, should rather be 3/2, I have plans to switch to
it asap.

CEPH started to throw many failures, then I removed these two SSDs, and
recreated journal OSD from scratch. In my case, all data on main OSD was
still there, but Ceph tried to do the best  it could to disable write to
OSDs and keep the data consistency.
After re-creating all 5 journal OSD on another HDD, recovery+backfill
started to work. After couple of hours it discovered 7 "unfound" objects (6
in data OSD and 1 hitset in cache tier). I found out what files were
affected, and hoped to not loose important data. Then after trying to
revert these 6 unfound object to the previous version, but if was
unsuccessfull, so I just deleted them. Most important problem we found was
that single hitset file that we couldn't just delete, and instead we took
some another hitset file and copied it onto missing one. Then cache tier
recognized this hitset and invalidated it, which allowed all the
backfill+recovery to finish, and finally entire Ceph cluster went back to
HEALTH_OK. Finally I run fsck wherever these 6 unfound files could affect,
and fortunately, these lost blocks were not important and contained empty
data, so fsck recovery was successfull in all cases. That was very
stressfull time :)

-- 
Wojtek

wt., 13.12.2016 o 00:01 użytkownik Christian Balzer <[email protected]> napisał:

> On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:
>
> > Hi,
> >
> > just in case: What happens when all replica journal SSDs are broken at
> once?
> >
> That would be bad, as in BAD.
>
> In theory you just "lost" all the associated OSDs and their data.
>
> In practice everything but in the in-flight data at the time is still on
> the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as
> Ceph is concerned.
>
> So with some trickery and an experienced data-recovery Ceph consultant you
> _may_ get things running with limited data loss/corruption, but that's
> speculation and may be wishful thinking on my part.
>
> Another data point to deploy only well known/monitored/trusted SSDs and
> have a 3x replication.
>
> > The PGs most likely will be stuck inactive but as I read, the journals
> just
> > need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd-
> > journal-failure/).
> >
> > Does this also work in this case?
> >
> Not really, no.
>
> The above works by having still a valid state and operational OSDs from
> which the "broken" one can recover.
>
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> [email protected]           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to