Hello,

we are preparing to buy hardware for our new ceph clusters.
Because of current economic conditions we are considering buying nodes consisting of JBODs + servers.


We have 4 potential variants for OSD nodes.

Variant A1
~100 HDD JOBD + x86_64 server with block.db NVMe


Variant A2
~100 HDD JOBD + 2x x86_64 server with block.db NVMe
JBOD split in half, each node gets 50 HDDs


Variant B1
~60 HDD JOBD + x86_64 server with block.db NVMe


Variant B2
~60 HDD JOBD + 2x x86_64 server with block.db NVMe
JBOD split in half, each node gets 30 HDDs


Splitting JBOD logically into 2 servers isn't an issue for use because we will replicate data on rack level and not host level.


Common specifications for all variants

5-6GB of RAM per 1 HDD
2% of HDD capacity in NVMe devices for block.db (or none)
2x 50Gb or 2x 100Gb Ethernet per server (active-backup bonded interfaces)
(CPU per OSD to be determined)


Variant A1 is very unlikely to happen but we are curious what network interface speeds would you suggest for so many HDDs in one node.

Variant A2 is the most likely the one we will choose for large deployment.

Variant B1/B2 for smaller deployments.

Does anyone of you run ceph on similar setups? Did you find any pitfall with it?

What are your minimal recommendations for network speed per HDD, cpu per HDD, etc?

In our experience most of our servers, even in large clusters, never max out the network interfaces or CPUs. We almost never rebuild or rebalance whole servers. 27 HDD nodes of our biggest CephFS cluster with EC usually have only 2-3Gbps of network traffic.

Best regards
Adam Prycki
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to