[ceph-users] Re: [Urgent suggestion needed] New Prod Cluster Hardware recommendation

Alex Gorbachev Wed, 09 Jul 2025 07:57:25 -0700

Completely agreeing with what Anthony wrote, and we see very good results
with at least 4 physical OSD nodes, managed and deployed by cephadm - you
will have 3 MONs and MGRs "hyperconverged" in cephadm sense, and run 3x
replication for OSD with an extra OSD host for n+1 redundancy.


Proxmox just needs a network and keyring to talk to this cluster.  You can
run deployment and automation functions from a VM in Proxmox that runs on
local storage.

--
Alex Gorbachev
https://alextelescope.blogspot.com



On Wed, Jul 9, 2025 at 10:28 AM Anthony D'Atri <a...@dreamsnake.net> wrote:

>
> >
> > I am new to this thread would like to get some suggestions to build new
> > external ceph  cluster
>
> Why external?  Many Proxmox deployments are converged.  Is this an
> existing Proxmox cluster that currently does not use shared storage?
>
>
> > which will backend for proxmox VM's
> >
> > I am planning to start with 5 Nodes(3 Mon & 2 OSD)
>
> This is not the best plan.
>
> If your data is not disposable you will want to maintain the default 3
> copies, which you cannot safely do on 2 OSD nodes.
>
> When deploying a very small cluster solve first for the number of nodes.
> You need at least 3 OSD nodes, 4 has advantages.
>
> So in your case, go converged: OSDs on all 5 nodes, and add the
> mon/mgr/etc ceph orch labels to all 5 so that when a node is down a
> replacement may be spun up.
>
> This would also let you deploy 5 mon instances instead of 3, which is
> advantageous in that you can ride out 2 failures without disruption.
>
> > and I am expecting to start with ~60+ TB usable space.
>
> That would mean (3 * 60) / .85 =211.765 ~ 212 TB of raw capacity, let’s
> see how that matches your numbers below.
>
> > estimated Storage Specs Calculator:
> >
> > RAM: 8GB/OSD Daemon, 16GB OS, 4GB for Mon & MGR, 16GB for MDS
>
> I would allot more than 4GB for mon/mgr.
>
> > cpu: 2 core/osd, 2 core for os, 2 core per services
>
> Cores or hyperthreads?  Either way these numbers are low.
>
> > *Dell R7625 5 Node to start with *
>
> Dramatic overkill for a mon/mgr/MDS node.
>
> > - RAM: 128G (Plan to increase later as needed)
>
> I suggest 32GB DIMMs to maximize potential for future expansion.
>
> > - CPU: 2x AMD EPYC 9224 2.50GHz, 24C/48T, 64M Cache (200W) DDR5-4800
>
> 96 threads total per server.
>
> > - Chassis Configuration 24x2.5 NVME
>
> You’ll be tempted to fill those slots; each OSD past, say, 12 will
> decrease performance due to having to share the vcores/threads.
> With the above CPU choice I would go with the R7615 to save rack space, or
> bump up the CPU. The 9224 is the default choice on Dell’s configurator but
> there are lots of others available. The 9454 for example would give you
> enough cores to more comfortably service an eventual 24 OSDs.
>
> Alternately consider the R7615 with, say, the 9654P. The P CPUs can’t be
> used in a dual-socket motherboard, so they’re usually a bit cheaper for the
> same specs.
>
> With EPYC CPUs you can get better performance by disabling IOMMU on the
> kernel command line via GRUB defaults.
>
>
> > - 2x1.92TB Data Center NVMe Read Intensive AG Drive U2 Gen4 with carrier
> (
> > OS Disk, I need extra space)
>
> Okay so that will limit you to 22 OSDs with the 24-bay chassis.  You could
> provision BOSS-N1 for M.2 boot though.
>
> > - 5x 7.68TB Data Center NVMe Read Intensive AG Drive U2 Gen4 with Carrier
> > 24Gbps 512e 2.5in Hot-Plug 1DWPD , AG Drive
>
> I think you have a copy/paste error there.  The second line above sounds
> like a SAS SSD.
>
> So from what you wrote about this would intend a total of 10x 7.68TB OSD
> drives.  With 3x replication and the default headroom ratios these will
> give you about 22 TB of usable space, which is just 20 TiB.
>
> > - 2x Nvidia ConnectX-6 Lx Dual Port 10/25GbE SFP28, No Crypto, PCIe Low
> > Profile
>
> I suggest bonding them and not having an optional replication network.
> Some people will use one port for public and the other for replication, but
> for multiple reasons that wouldn’t be ideal.
>
> >
> > - 1G for IPMI
> >
> > Please help me finalize these specs.
> >
> > Thanks
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent suggestion needed] New Prod Cluster Hardware recommendation

Reply via email to