Hello,

On Mon, 23 Apr 2018 17:43:03 +0200 Florian Florensa wrote:

> Hello everyone,
> 
> I am in the process of designing a Ceph cluster, that will contain
> only SSD OSDs, and I was wondering how should I size my cpu.
Several threads about this around here, but first things first.
Any specifics about the storage needs, i.e. do you think you need the SSDs
for bandwidth or for IOPS reasons primarily?
Lots of smallish writes or large reads/writes?

> The cluster will only be used for block storage.
> The OSDs will be Samsung PM863 (2Tb or 4Tb, this will be determined

I assume PM863a, the non "a" model seems to be gone.
And that's a 1.3 DWPD drive, with a collocated journal or lots of small
writes and a collocated WAL/DB it will be half of that.
So run the numbers and make sure this is actually a good fit in the
endurance area. 
Of course depending on your needs, journals or WAL/DB on higher endurance
NVMes might be a much better fit anyway.

> when we will set the total volumetry in stone), and it will be in 2U
> 24SSDs servers
How many servers are you thinking about?
Because the fact that you're willing to double the SSD size but not the 
number of servers suggests that you're thinking about a small number of
servers.
And while dense servers will save you space and money, more and smaller
servers are generally a better fit for Ceph, not least when considering
failure domains (a host typically).

> Those server will probably be either Supermicro 2029U-E1CR4T or
> Supermicro 2028R-E1CR48L.
> I’ve read quite a lot of documentation regarding hardware choices, and
> I can’t find a ‘guideline’ for OSDs on SSD with colocated journal.
If this is a new cluster, that would be collocated WAL/DB and Bluestore.
Never mind my misgivings about Bluestore, at this point in time you
probably don't want to deploy a new cluster with filestore, unless you
have very specific needs and know what you're doing.

> I was pointing for either dual ‘Xeon gold 6146’ or dual ‘Xeon 2699v4’
> for the cpus, depending on the chassis.
The first one is a much better fit in terms of the "a fast core for each
OSD" philosophy needed for low latency and high IOPS. The 2nd is just
overkill, 24 real cores will do and for extreme cases I'm sure I can still
whip a fio setting that will saturate the 44 real cores of the 2nd setup.
Of course dual CPU configurations like this come with a potential latency
penalty for NUMA misses. 

Unfortunately Supermicro didn't release my suggested Epyc based Ceph
storage node (yet?). 
I was mentioning a single socket 1U (or 2U double) with 10 2.5 bays, with
up to 2 NVMe in those bays.
But even dual CPU Epyc based systems have a clear speed advantage when it
comes to NUMA misses due to the socket interconnect (Infinity Fabric).

Do consider this alternative setup:
https://www.supermicro.com.tw/Aplus/system/1U/1123/AS-1123US-TR4.cfm
With either 8 SSDs and 2 NVMes or 10 SSDs and either 
2x Epyc 7251 (adequate core ratio and speed, cheap) or
2x Epyc 7351 (massive overkill, but still 1/4 of the Intel price tag). 

The unreleased AS-2123US-TN24R25 with 2x Epyc 7351 might be a good fit as
well.

> For the network part, I was thinking of using two Dual port connectx4
> Lx from mellanox per servers.
> 
Going to what kind of network/switches?

> If anyone has some ideas/thoughts/pointers, I would be glad to hear them.
> 
RAM, you'll need a lot of it, even more with Bluestore given the current
caching.
I'd say 1GB per TB storage as usual and 1-2GB extra per OSD.

> Regards,
> 
> Florian
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to