Hi James,

 

I can see where some of the confusion has arisen, hopefully I can put at least 
some of it to rest. In the Tumblr post from Yahoo, the keyword to look out for 
is “nodes”, which is distinct from individual hard drives which in Ceph is an 
OSD in most cases. So you would have multiple OSDs per node.

 

My quick napkin math would suggest that they are using 54 storage nodes, each 
holding 16 drives/OSDs (this doesn’t count the OS drives which aren’t specified 
in the post), as with the below math:

 

54 storage nodes providing 3.2PB of raw store requires ~59.25TB of storage per 
node

59.25TB / 12 = 4.94TB per OSD

59.25TB / 14 = 4.32TB per OSD

59.25TB / 16 = 3.70TB per OSD

 

Total OSDs per cluster = 864

EC Calculation: 8 / (8+3) = 72.73%

 

As they are using an 8/3 erasure coding configuration, that would provide an 
efficiency of 72.73% (see EC Calculation), so the usable capacity per storage 
cluster is around 2.33PB.

 

I haven’t included the calculation for anything below 12 as while it is 
possible, I find the 16 drive configuration most probable. As Ceph crush weight 
is shown using TiB, but most hard drives are marketed in TB due to the higher 
value, that would mean that 4TB drives are in use providing 3.63TiB of usable 
space on the drive. The math isn’t perfect here as you can see, but I’d think 
it is a safe assumption that they have at least a few higher capacity drives in 
there, or a wider mix of such standard commodity drive sizes with 4TB simply 
being a decent average.

 

For object storage clusters, particularly in use cases of high volumes of small 
objects, a standard OSD/node density is preferable which hovers between 10 and 
16 OSDs per server depending who you ask (some reading on the subject courtesy 
of RedHat 
https://www.redhat.com/cms/managed-files/st-ceph-storage-qct-object-storage-reference-architecture-f7901-201706-v2-en.pdf).
  As Yahoo’s workload is noting consistency and latency are some important 
metrics, they are also likely to use this density profile rather than something 
higher – this has the added benefit of quicker recovery times in the event of 
an individual OSD/host failure which is a parameter they tuned quite 
extensively.

 

For hashing algorithms and load balancing, I am not quite sure I understand 
your question, but RGW which implements object storage in Ceph has the ability 
to configure multiple zones/groups/regions, it might be best to have a read 
through the docs first:

http://docs.ceph.com/docs/luminous/radosgw/multisite/

 

Ceph is quite different from a SAN or DAS, and gives a great deal more 
flexibility too. If you are unsure on getting started and you need to hit the 
ground running strongly (ie a multi-PB production system), I’d really recommend 
getting a reliable consultant or taking out professional support services for 
it. Ceph is a piece of cake to manage when everything is working well, and very 
often this will be the case for a long time, but you will really value good 
planning and experience when you hit those rough patches.

 

Hope that helps,

 

Tom

 

 

From: ceph-users <[email protected]> On Behalf Of James Watson
Sent: 28 August 2018 21:05
To: [email protected]
Subject: [ceph-users] SAN or DAS for Production ceph

 

Dear cephers, 

 

I am new to the storage domain. 

Trying to get my head around the enterprise - production-ready setup. 

 

The following article helps a lot here: (Yahoo ceph implementation)

https://yahooeng.tumblr.com/tagged/object-storage

 

But a couple of questions:

 

What HDD would they have used here? NVMe / SATA /SAS etc (with just 52 storage 
node they got 3.2 PB of capacity !! )

I try to calculate a similar setup with HGST Ultrastar He12 (12TB and it's more 
recent ) and would need 86 HDDs that adds up to 1 PB only!!

 

How is the HDD drive attached is it DAS or a SAN (using Fibre Channel Switches, 
Host Bus Adapters etc)?

 

Do we need a proprietary hashing algorithm to implement multi-cluster based 
setup of ceph to contain CPU/Memory usage within the cluster when rebuilding 
happens during device failure?

 

If proprietary hashing algorithm is required to setup multi-cluster ceph using 
load balancer - then what could be the alternative setup we can deploy to 
address the same issue?

 

The aim is to design a similar architecture but with upgraded products and 
higher performance. - Any suggestions or thoughts are welcome 

 

 

 

Thanks in advance

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to