[ceph-users] Re: New Cluster Ceph

Wannes Smet Thu, 20 Nov 2025 21:51:56 -0800

Hi,

A maybe even more controversial take: Have you considered refurbished? Why not 
go "wide" in your cluster design with more but less powerful nodes? Ceph is 
software defined, not hardware defined. As long as you make "sane" hardware 
choices, Ceph will run on it just fine.


You'll likely be able to match the performance of a 6 node "new" cluster with 
more (but less powerful) nodes. It'll consume more electricity. But 1 node 
dying in a 6 node cluster has a higher impact than 1 node dying in a 10 node 
cluster.

We stepped into Ceph on a mix of recently decommissioned harware plus 
refurbished hardware to complete the cluster. I've designed it with 
"refurbished" in mind. Like the cluster switches are 4x redundant. They were 
€45 a piece, I needed 8, so why not? 🙂

One important aspect of buying refurbished IMHO is a reliable seller. So you 
know you're not on your own en case of hardware failure. So far we've been 
lucky with our seller. No shenanigans, if something's broken (which rarely 
happens, at least not more often than newly bought), no questions asked, we get 
a replacement sent to us.

Wannes
________________________________
From: GLE, Vivien <[email protected]>
Sent: Tuesday, November 18, 2025 13:47
To: [email protected] <[email protected]>
Subject: [ceph-users] New Cluster Ceph

Hi,

We plan to buy hardware for a new cluster ceph and would like some approbation 
on what we choose.

6 nodes DELL 6715 + 6 Powervault MD2412 enclosure

For each node =>

1 CPU AMD EPYC 9475F 3,65 GHz, 48C/96T, 256M Cache (400 W)
RAM 16Gox16 = 256Go
4x 100GB NVIDIA MELLANOX 100GB
HBA465e (externe, 22,5GB/s)

2 NVMe mixed 6,4To DBWAL
6 NVMe mixed 6,4To
5 NVMe read 15,36To

for each enclosure =>

12 HDD 20To

HDD will be a replica 3 pool with the dbwal
NVME mixed and read will be 2 differents pools (we will test replica 3 and EC 
to see which performance/storage efficency satisfy us the most)

This cluster will be mostly use to store block (VM proxmox/kubernetes) and S3. 
The crushmap will be like so : 3 rooms with 2 nodes per room.

So the point of failure can be for replica 3 => rooms and if we use EC 4+2 => 
host

We saw that the thread needs for NVMe OSD are very expensive, does this CPU 
good enough to carry them ?

MON and MGR will be spread across the cluster and RGW on virtual machine

Thanks for your answer !

Vivien

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: New Cluster Ceph

Reply via email to