[ceph-users] Another OSD Crush question.

Rogier Dikkes Thu, 23 Apr 2015 02:21:13 -0700

Hello all, 

At this moment we have a scenario where i would like your opinion on.


Scenario: 
Currently we have a ceph environment with 1 rack of hardware, this rack 
contains a couple of OSD nodes with 4T disks. In a few months time we will 
deploy 2 more racks with OSD nodes, these nodes have 6T disks and 1 node more 
per rack. 

Short overview: 
rack1: 4T OSD
rack2: 6T OSD
rack3: 6T OSD

At this moment we are playing around with the idea to use the CRUSH map to make 
ceph 'rack aware' and ensure to have data replicated between racks. However 
from documentation i gathered i found that when you enforce data replication 
between buckets then your max storage size will be the lowest bucket value. My 
understanding: enforce the objects (size=3) to be replicated to 3 racks, the 
moment the rack with 4T OSD's is full we cannot store data anymore. 

Is this assumption correct?

The current idea we play with: 

- Create 2 rack buckets
- Create a ruleset to create 2 object replica’s for the 2x 6T buckets
- Create a ruleset to create 1 object replica over all the hosts.

This would result in 3 replicas of the object. Where we are sure that 2 objects 
at least are in different racks. In the unlikely event of a rack failure we 
would have at least 1 or 2 replica’s left.

Our idea is to have a crush rule with config that looks like: 
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9


      host r01-cn01 {
              id -1
              alg straw
              hash 0
              item osd.0 weight 4.00
      }

      host r01-cn02 {
              id -2
              alg straw
              hash 0
              item osd.1 weight 4.00
      }

      host r01-cn03 {
              id -3
              alg straw
              hash 0
              item osd.3 weight 4.00
      }

      host r02-cn04 {
              id -4
              alg straw
              hash 0
              item osd.4 weight 6.00
      }

      host r02-cn05 {
              id -5
              alg straw
              hash 0
              item osd.5 weight 6.00
      }

      host r02-cn06 {
              id -6
              alg straw
              hash 0
              item osd.6 weight 6.00
      }

      host r03-cn07 {
              id -7
              alg straw
              hash 0
              item osd.7 weight 6.00
      }

      host r03-cn08 {
              id -8
              alg straw
              hash 0
              item osd.8 weight 6.00
      }

      host r03-cn09 {
              id -9
              alg straw
              hash 0
              item osd.9 weight 6.00
      }

      rack r02 {
              id -10
              alg straw
              hash 0
              item r02-cn04 weight 6.00
              item r02-cn05 weight 6.00
              item r02-cn06 weight 6.00
      }      

      rack r03 {
              id -11
              alg straw
              hash 0
              item r03-cn07 weight 6.00
              item r03-cn08 weight 6.00
              item r03-cn09 weight 6.00
      }

      root 6t {
              id -12
              alg straw
              hash 0
              item r02 weight 18.00
              item r03 weight 18.00
      }

      rule one {
              ruleset 1
              type replicated
              min_size 1
              max_size 10
              step take 6t
              step chooseleaf firstn 2 type rack
              step chooseleaf firstn 1 type host
              step emit
      }
Is this the right approach and would this cause limitations in regards of 
performance or usability? Do you have suggestions? 

Another interesting situation we have now is: We are going to move the hardware 
to new locations next year, the rack layout will change and thus the crush map 
will be altered. When changing a CRUSH map that theoretically would change the 
2x 6T racks into 4 racks, would we need to take any special actions into 
consideration?

Thank you for your answers, they are much appreciated! 

Rogier Dikkes
System Programmer Hadoop & HPC Cloud
SURFsara | Science Park 140 | 1098 XG Amsterdam

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Another OSD Crush question.

Reply via email to