On 04/20/2015 01:46 PM, Robert LeBlanc wrote:
> 
> 
> On Mon, Apr 20, 2015 at 2:34 PM, Colin Corr <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
> 
>     On 04/20/2015 11:02 AM, Robert LeBlanc wrote:
>     > We have a similar issue, but we wanted three copies across two racks. 
> Turns out, that we increased size to 4 and left min_size at 2. We didn't want 
> to risk having less than two copies and if we only had thee copies, losing a 
> rack would block I/O. Once we expand to a third rack, we will adjust our rule 
> and go to size 3. Searching the mailing list and docs proved difficult, so 
> I'll include my rule so that you can use it as a basis. You should be able to 
> just change rack to host and host to osd. If you want to keep only three 
> copies, the "extra" OSD chosen just won't be used as Gregory mentions. 
> Technically this rule should have "max_size 4", but I won't set a pool over 4 
> copies so I didn't change it here.
>     >
>     > If anyone has a better way of writing this rule (or one that would work 
> for both a two rack and 3+ rack configuration as mentioned above), I'd be 
> open to it. This is the first rule that I've really wrote on my own.
>     >
>     > rule replicated_ruleset {
>     >         ruleset 0
>     >         type replicated
>     >         min_size 1
>     >         max_size 10
>     >         step take default
>     >         step choose firstn 2 type rack
>     >         step chooseleaf firstn 2 type host
>     >         step emit
>     > }
> 
>     Thank you Robert. Your example was very helpful. I didn't realize you 
> could nest the choose and chooseleaf steps together. I thought chooseleaf 
> effectively handled that for you already. This makes a bit more sense now.
> 
> 
> I'm still a little fuzzy on it myself as well, but by not having an emit step 
> between the choose and chooseleaf makes chooseleaf operate on the items 
> chosen by choose instead of picking new things from all available entities. I 
> couldn't get crushtool --test --simulate to work properly to confirm 
> (http://tracker.ceph.com/issues/11224), but it is working properly in our 
> cluster. Just FYI, the min_size and max_size does not change your pools, it 
> only specifies what sizes the rule works for. Technically if the pool size 
> (replica size) is less than 2 or greater than 3, this rule would not be 
> selected.

Thanks for the help. Reading your comments and re-reading the documentation is 
helpful in understanding how the rule language works. I had a few 
misconceptions.

Any thoughts as to what conditions would cause us to end up with more than the 
specified number of replicas? Is it for recovery scenarios or like a safety 
rail for flapping OSDs?

It would seem that the default min_size and max_size values (1 and 10) are 
sufficient for this rule, just as you demonstrated in your rule. 

rule host_rule {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step choose firstn 2 type host
        step chooseleaf firstn 2 type osd
        step emit
}

>     My rule looks like this now:
>     rule host_rule {
>             ruleset 2
>             type replicated
>             min_size 2
>             max_size 3
>             step take default
>             step choose firstn 2 type host
>             step chooseleaf firstn 2 type osd
>             step emit
>     }
> 
>     And the cluster is reporting the pool as clean, finally. If I understand 
> correctly, we will now potentially have as many as 4 replicas of an object in 
> the pool, 2 on each host.
> 
> 
> You will only have 4 replicas if you set the size of your pool to 4, 
> otherwise if it is the default, it will be three. The rule will support up to 
> 4 replicas.




_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to