Ok, thanks for your explanation!
I read those warnings about size 2 + min_size 1 (we are using ZFS as RAID6,
called zraid2) as OSDs.
Time to raise replication!

Kevin

2016-12-13 0:00 GMT+01:00 Christian Balzer <[email protected]>:

> On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:
>
> > Hi,
> >
> > just in case: What happens when all replica journal SSDs are broken at
> once?
> >
> That would be bad, as in BAD.
>
> In theory you just "lost" all the associated OSDs and their data.
>
> In practice everything but in the in-flight data at the time is still on
> the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as
> Ceph is concerned.
>
> So with some trickery and an experienced data-recovery Ceph consultant you
> _may_ get things running with limited data loss/corruption, but that's
> speculation and may be wishful thinking on my part.
>
> Another data point to deploy only well known/monitored/trusted SSDs and
> have a 3x replication.
>
> > The PGs most likely will be stuck inactive but as I read, the journals
> just
> > need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd-
> > journal-failure/).
> >
> > Does this also work in this case?
> >
> Not really, no.
>
> The above works by having still a valid state and operational OSDs from
> which the "broken" one can recover.
>
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> [email protected]           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to