Ok, thanks for your explanation! I read those warnings about size 2 + min_size 1 (we are using ZFS as RAID6, called zraid2) as OSDs. Time to raise replication!
Kevin 2016-12-13 0:00 GMT+01:00 Christian Balzer <[email protected]>: > On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote: > > > Hi, > > > > just in case: What happens when all replica journal SSDs are broken at > once? > > > That would be bad, as in BAD. > > In theory you just "lost" all the associated OSDs and their data. > > In practice everything but in the in-flight data at the time is still on > the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as > Ceph is concerned. > > So with some trickery and an experienced data-recovery Ceph consultant you > _may_ get things running with limited data loss/corruption, but that's > speculation and may be wishful thinking on my part. > > Another data point to deploy only well known/monitored/trusted SSDs and > have a 3x replication. > > > The PGs most likely will be stuck inactive but as I read, the journals > just > > need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd- > > journal-failure/). > > > > Does this also work in this case? > > > Not really, no. > > The above works by having still a valid state and operational OSDs from > which the "broken" one can recover. > > Christian > -- > Christian Balzer Network/Systems Engineer > [email protected] Global OnLine Japan/Rakuten Communications > http://www.gol.com/ >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
