Hi everyone,

I am currently looking at Ceph to build a cluster to backup VMs. I am
leveraging the solution against others like traditionnal SANs, etc. and to
this point Ceph is economically more interesting and technically more
challenging (not to bother me :) ).

OSD hosts should be based on Dell R730xd hardware, I plan to put 3 SSD and
9 OSD (4TB) per host.
I need approximately 100TB and, in order to save some space and still get
the level of resiliency you can expect for backups, i am leaning towards EC
(4+2) and 7 hosts.

I would like some input on the questions that still remain :

 - I can put more OSD directly inside the server (up to 4 additional disks)
but that would require to power down the host to replace an "inner" OSD in
case of failure. I was thinking I could add 3 internal disks to have 12 OSD
per node instead of 9 for a higher density, at the cost of a more complex
maintenance and higher risk for the cluster as there would be 4 OSD
journals per SSD instead of 3. How manageable is bringing down a complete
node for replacing a disk ? noout will surely come into play. How will the
cluster behave when the host is back online to sync data ?

 - I also wonder about SSD failure,even if I intend to use Intel 3700 or at
least 3610, in order not to be bothered with such issues :) So, in case of
a SSD failure, the cluster should start backfilling / rebalancing the data
of 3 to 4 OSDs. With proper monitoring and spare disks, one could replace
the SSD within hours and avoid the impacts of backfilling lots of data, but
this would require a fine tuning of how OSD are marked out. I know it is a
bit bending the natural features and behaviours of Ceph but has anyone
tested this approch ? With custom monitoring scripts or others ? Would you
think it can be considered or the only way is to buy SSD that can sustain
the load ? Also same question as above, how do Ceph handle down OSD that
are set up after a while ?

 - My goal is to reach a bandwidth of several hundreds of MBytes of mostly
sequential writes. Do you think a cluster of this type and size will be
able to handle it ? The only benchmarks I could find on the mailing list
are Loic's on EC plugins and Roy's on a full SSD EC backend.

Lots of thanks in advance,

Adrien
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to