Hello everybody,

can anybody comment on the largest number of
production VMs running on top of Ceph?

Thanks,
Constantinos

On 04/01/2014 09:47 PM, Jeremy Hanmer wrote:
Our (DreamHost's) largest cluster is roughly the same size as yours,
~3PB on just shy of 1100 OSDs currently.  The architecture's quite
similar too, except we have "separate" 10G front-end and back-end
networks with a partial spine-leaf architecture using 40G
interconnects.  I say "separate" because the networks only exist in
the logical space; they aren't separated among different bits of
network gear.  Another thing of note is that Ceph will prevent the
marking down of entire racks of OSDs without crazy tweaks of your
CRUSH map (via the mon_osd_down_out_subtree_limit config option).
That actually saved us recently when we suffered a couple of switch
crashes one weekend.


On Tue, Apr 1, 2014 at 7:18 AM, Dan Van Der Ster
<daniel.vanders...@cern.ch> wrote:
Hi,

On 1 Apr 2014 at 15:59:07, Andrey Korolyov (and...@xdel.ru) wrote:

On 04/01/2014 03:51 PM, Robert Sander wrote:
On 01.04.2014 13:38, Karol Kozubal wrote:

I am curious to know what is the largest known ceph production
deployment?
I would assume it is the CERN installation.

Have a look at the slides from Frankfurt Ceph Day:

http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern

Regards

Just curious, how CERN guys built the network topology to prevent
possible cluster splits, because split in the middle will cause huge
downtime even for a relatively short split time enough to mark half of
those 1k OSDs as down by remaining MON majority.


The mons are distributed around the data centre, across N switches.
The OSDs are across a few switches -- actually, we could use CRUSH rules to
replicate across switches but didn't do so because of an (unconfirmed) fear
that the uplinks would become a bottleneck.
So a switch or routing outage scenario is clearly a point of failure where
some PGs could become stale, but we've been lucky enough not to suffer from
that yet.

BTW, this 3PB cluster was built to test the scalability of Ceph's
implementation, not because we have 3PB of data to store in Ceph today (most
of the results of those tests are discussed in that presentation.). And we
are currently partitioning this cluster down into a smaller production
instance for Cinder and other instances for ongoing tests.

BTW#2, I don't think the CERN cluster is the largest. Isn't DreamHost's
bigger?

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to