Re: [ceph-users] how possible is that ceph cluster crash

Goncalo Borges Wed, 16 Nov 2016 15:43:45 -0800

Olá Pedro...

These are extremely generic questions, and therefore, hard to answer.  Nick did 
a good job in defining the risks.


In our case, we are running a Ceph/CephFS system in production for over an 
year, and before that, we tried to understand Ceph for a year also.

Ceph is incredibility good is dealing with hardware failures so it is a 
powerfull tool if you are using commodity hardware. If your disks fail or even 
if a fraction of your hosts fail, it is able to cope and recover properly 
(until a given extent) if you have the proper crush rules in place (the default 
ones do a good job on that) and free space available. To be on the safe side:
- decouple mons from osds servers
- check the RAM requirement for your osds servers (depend in the number of osds 
in each server)
- have, at least, 3 mons in a production system
- use a 3x replica 
There is a good info page on hardware requirements in the ceph wikis.

However, the devil is on the details. Ceph is a complex system still in 
permanent development. Wrong configurations might lead to performance problems. 
If your network is not reliable, that might lead to flapping osds, which on its 
turn, might lead to problems in your pgs. When your osds starts to become full 
(a single full osd freezes all I/O to the cluster) many problems may start to 
appear. Finally there are bugs. Their number is not huge and there is a real 
good effort form the developers and from the community to address those in a 
fast and reliable way. However, sometimes it is difficult to diagnose what 
could be wrong because of the so many layers involved. It is not infrequent 
that we have to go and look to the source code to figure out (when possible) 
what may be happening. So, I would say that there is a learning curve that 
myself and others are still going through.

Abraço
Gonçalo





________________________________________
From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Pedro Benites 
[pbeni...@litholaser.com]
Sent: 17 November 2016 04:50
To: ceph-users@lists.ceph.com
Subject: [ceph-users] how possible is that ceph cluster crash

Hi,

I have a ceph cluster with 50 TB, with 15 osds, it is working fine for
one year and I would like to grow it and migrate all my old storage,
about 100 TB to ceph, but I have a doubt. How possible is that the
cluster fail and everything went very bad? How reliable is ceph? What is
the risk about lose my data.? is necessary backup my data?

Regards.
Pedro.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how possible is that ceph cluster crash

Reply via email to