Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

Kevin Olbrich Wed, 07 Dec 2016 12:19:25 -0800

Is Ceph accepting this OSD if the other (newer) replica is down?
In this case I would assume that my cluster is instantly broken when rack
_after_ rack fails (power outage) and I just start in random order.
We have at least one MON on stand-alone UPS to resolv such an issue - I
just assumed this is safe regardless of full outage.



Mit freundlichen Grüßen / best regards,
Kevin Olbrich.

2016-12-07 21:10 GMT+01:00 Wido den Hollander <[email protected]>:

>
> > Op 7 december 2016 om 21:04 schreef "Will.Boege" <[email protected]
> >:
> >
> >
> > Hi Wido,
> >
> > Just curious how blocking IO to the final replica provides protection
> from data loss?  I’ve never really understood why this is a Ceph best
> practice.  In my head all 3 replicas would be on devices that have roughly
> the same odds of physically failing or getting logically corrupted in any
> given minute.  Not sure how blocking IO prevents this.
> >
>
> Say, disk #1 fails and you have #2 and #3 left. Now #2 fails leaving only
> #3 left.
>
> By block you know that #2 and #3 still have the same data. Although #2
> failed it could be that it is the host which went down but the disk itself
> is just fine. Maybe the SATA cable broke, you never know.
>
> If disk #3 now fails you can still continue your operation if you bring #2
> back. It has the same data on disk as #3 had before it failed. Since you
> didn't allow for any I/O on #3 when #2 went down earlier.
>
> If you would have accepted writes on #3 while #1 and #2 were gone you have
> invalid/old data on #2 by the time it comes back.
>
> Writes were made on #3 but that one really broke down. You managed to get
> #2 back, but it doesn't have the changes which #3 had.
>
> The result is corrupted data.
>
> Does this make sense?
>
> Wido
>
> > On 12/7/16, 9:11 AM, "ceph-users on behalf of LOIC DEVULDER" <
> [email protected] on behalf of [email protected]>
> wrote:
> >
> >     > -----Message d'origine-----
> >     > De : Wido den Hollander [mailto:[email protected]]
> >     > Envoyé : mercredi 7 décembre 2016 16:01
> >     > À : [email protected]; LOIC DEVULDER - U329683 <
> [email protected]>
> >     > Objet : RE: [ceph-users] 2x replication: A BIG warning
> >     >
> >     >
> >     > > Op 7 december 2016 om 15:54 schreef LOIC DEVULDER
> >     > <[email protected]>:
> >     > >
> >     > >
> >     > > Hi Wido,
> >     > >
> >     > > > As a Ceph consultant I get numerous calls throughout the year
> to
> >     > > > help people with getting their broken Ceph clusters back
> online.
> >     > > >
> >     > > > The causes of downtime vary vastly, but one of the biggest
> causes is
> >     > > > that people use replication 2x. size = 2, min_size = 1.
> >     > >
> >     > > We are building a Ceph cluster for our OpenStack and for data
> integrity
> >     > reasons we have chosen to set size=3. But we want to continue to
> access
> >     > data if 2 of our 3 osd server are dead, so we decided to set
> min_size=1.
> >     > >
> >     > > Is it a (very) bad idea?
> >     > >
> >     >
> >     > I would say so. Yes, downtime is annoying on your cloud, but data
> loss if
> >     > even worse, much more worse.
> >     >
> >     > I would always run with min_size = 2 and manually switch to
> min_size = 1
> >     > if the situation really requires it at that moment.
> >     >
> >     > Loosing two disks at the same time is something which doesn't
> happen that
> >     > much, but if it happens you don't want to modify any data on the
> only copy
> >     > which you still have left.
> >     >
> >     > Setting min_size to 1 should be a manual action imho when size = 3
> and you
> >     > loose two copies. In that case YOU decide at that moment if it is
> the
> >     > right course of action.
> >     >
> >     > Wido
> >
> >     Thanks for your quick response!
> >
> >     That's make sense, I will try to convince my colleagues :-)
> >
> >     Loic
> >     _______________________________________________
> >     ceph-users mailing list
> >     [email protected]
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

Reply via email to