On Tuesday, April 28, 2015, Dominik Hannen <[email protected]> wrote:

> Hi ceph-users,
>
> I am currently planning a cluster and would like some input specifically
> about the storage-nodes.
>
> The non-osd systems will be running on more powerful system.
>
> Interconnect as currently planned:
> 4 x 1Gbit LACP Bonds over a pair of MLAG-capable switches (planned: EX3300)
>
>
One problem with LACP is that it will only allow you to have 1Gbps between
any two IPs or MACs (depending on your switch config). This will most
likely limit the throughput of any client to 1Gbps, which is equivalent
to 125MBps storage throughput.  It is not really equivalent to a 4Gbps
interface or 2x 2Gbps interfaces (if you plan to have a client network and
cluster network).

So far I would go with Supermicros 5018A-MHN4 offering, rack-space is not
> really a concern, so only 4 OSDs per U is fine.
> (The cluster is planned to start with 8 osd-nodes.)
>
> osd-node:
> Avoton C2758 - 8 x 2.40GHz
> 16 GB RAM ECC
> 16 GB SSD - OS - SATA-DOM
> 250GB SSD - Journal (MX200 250GB with extreme over-provisioning, staggered
> deployment, monitored for TBW-Value)
> 4 x 3 TB OSD - Seagate Surveillance HDD (ST3000VX000) 7200rpm 24/7
> 4 x 1 Gbit
>
> per-osd breakdown:
> 3 TB HDD
> 2 x 2.40GHz (Avoton-Cores)
> 4 GB RAM
> 8 GB SSD-Journal (~125 MB/s r/w)
> 1 Gbit
>
> The main question is, will the Avoton CPU suffice? (I recon the common
> 1GHz/OSD suggestion are in regards to much more powerful CPUs.)
>
> I don't have any experience with this CPU, but 8x 2.4GHz cores for 4 OSDs
seems like plenty of CPU.

I have 32GB of RAM for 7 osds, which has been enough for me.

Are there any cost-effective suggestions to improve this configuration?


I have implemented a small cluster with no SSD journals, and the
performance is pretty good.

42 osds, 3x replication, 40Gb NICs rados bench shows me 2000 iops at 4k
writes and 500MBps at 4M writes.

I would trade your SSD journals for 10Gb NICs and switches.  I started out
with the same 4x 1Gb LACP config and things like rebalancing/recovery were
terribly slow, as well as the throughput limit I mentioned above.

When you get more funding next quarter/year, you can choose to add the SSD
journals or more OSD nodes. Moving to 10Gb networking after you get the
cluster up and running will be much harder.


> Will erasure coding be a feasible possibility?
>
> Does it hurt to run OSD-nodes CPU-capped, if you have enough of them?
>
> ___
> Dominik Hannen
> _______________________________________________
> ceph-users mailing list
> [email protected] <javascript:;>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to