Re: [ceph-users] Another OSD Crush question.

Robert LeBlanc Thu, 23 Apr 2015 08:51:06 -0700

If you force CRUSH to put copies in each rack, then you will be limited by
the smallest rack. You can have some sever limitations if you try to keep
your copies to two racks (see the thread titles "CRUSH rule for 3 replicas
across 2 hosts") for some of my explanation about this.


If I were you, I would install almost all the new hardware and hold out a
few pieces. Get the new hardware up and running, then take down some of the
original hardware and relocate it in the other cabinets so that you even
out the older lower capacity nodes and new higher capacity nodes in each
cabinet. That would give you the best of redundancy and performance (not
all PGs would have to have a replica on the potentially slower hardware).
This would allow you to have replication level three and able to lose a
rack.

Another options if you have the racks is to spread the new hardware over 3
racks instead of 2 so that your cluster is over 4 racks. CRUSH will give a
preference to the newer hardware (assuming the CRUSH weights reflect the
size of the disk) and you would no longer be limited by the older smaller
rack.

On Thu, Apr 23, 2015 at 3:20 AM, Rogier Dikkes <[email protected]>
wrote:

> Hello all,
>
> At this moment we have a scenario where i would like your opinion on.
>
> Scenario:
> Currently we have a ceph environment with 1 rack of hardware, this rack
> contains a couple of OSD nodes with 4T disks. In a few months time we will
> deploy 2 more racks with OSD nodes, these nodes have 6T disks and 1 node
> more per rack.
>
> Short overview:
> rack1: 4T OSD
> rack2: 6T OSD
> rack3: 6T OSD
>
> At this moment we are playing around with the idea to use the CRUSH map to
> make ceph 'rack aware' and ensure to have data replicated between racks.
> However from documentation i gathered i found that when you enforce data
> replication between buckets then your max storage size will be the lowest
> bucket value. My understanding: enforce the objects (size=3) to be
> replicated to 3 racks, the moment the rack with 4T OSD's is full we cannot
> store data anymore.
>
> Is this assumption correct?
>
> The current idea we play with:
>
> - Create 2 rack buckets
> - Create a ruleset to create 2 object replica’s for the 2x 6T buckets
> - Create a ruleset to create 1 object replica over all the hosts.
>
> This would result in 3 replicas of the object. Where we are sure that 2
> objects at least are in different racks. In the unlikely event of a rack
> failure we would have at least 1 or 2 replica’s left.
>
> Our idea is to have a crush rule with config that looks like:
>
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
> device 9 osd.9
>
>
>       host r01-cn01 {
>               id -1
>               alg straw
>               hash 0
>               item osd.0 weight 4.00
>       }
>
>       host r01-cn02 {
>               id -2
>               alg straw
>               hash 0
>               item osd.1 weight 4.00
>       }
>
>       host r01-cn03 {
>               id -3
>               alg straw
>               hash 0
>               item osd.3 weight 4.00
>       }
>
>       host r02-cn04 {
>               id -4
>               alg straw
>               hash 0
>               item osd.4 weight 6.00
>       }
>
>       host r02-cn05 {
>               id -5
>               alg straw
>               hash 0
>               item osd.5 weight 6.00
>       }
>
>       host r02-cn06 {
>               id -6
>               alg straw
>               hash 0
>               item osd.6 weight 6.00
>       }
>
>       host r03-cn07 {
>               id -7
>               alg straw
>               hash 0
>               item osd.7 weight 6.00
>       }
>
>       host r03-cn08 {
>               id -8
>               alg straw
>               hash 0
>               item osd.8 weight 6.00
>       }
>
>       host r03-cn09 {
>               id -9
>               alg straw
>               hash 0
>               item osd.9 weight 6.00
>       }
>
>       rack r02 {
>               id -10
>               alg straw
>               hash 0
>               item r02-cn04 weight 6.00
>               item r02-cn05 weight 6.00
>               item r02-cn06 weight 6.00
>       }
>
>       rack r03 {
>               id -11
>               alg straw
>               hash 0
>               item r03-cn07 weight 6.00
>               item r03-cn08 weight 6.00
>               item r03-cn09 weight 6.00
>       }
>
>       root 6t {
>               id -12
>               alg straw
>               hash 0
>               item r02 weight 18.00
>               item r03 weight 18.00
>       }
>
>       rule one {
>               ruleset 1
>               type replicated
>               min_size 1
>               max_size 10
>               step take 6t
>               step chooseleaf firstn 2 type rack
>               step chooseleaf firstn 1 type host
>               step emit
>       }
>
> Is this the right approach and would this cause limitations in regards of
> performance or usability? Do you have suggestions?
>
> Another interesting situation we have now is: We are going to move the
> hardware to new locations next year, the rack layout will change and thus
> the crush map will be altered. When changing a CRUSH map that theoretically
> would change the 2x 6T racks into 4 racks, would we need to take any
> special actions into consideration?
>
> Thank you for your answers, they are much appreciated!
>
> Rogier Dikkes
> System Programmer Hadoop & HPC Cloud
> SURFsara | Science Park 140 | 1098 XG Amsterdam
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Another OSD Crush question.

Reply via email to