On 04/20/2015 04:18 PM, Robert LeBlanc wrote:
> You usually won't end up with more than the "size" number of replicas, even 
> in a failure situation. Although technically more than "size" number of OSDs 
> may have the data (if the OSD comes back in service, the journal may be used 
> to quickly get the OSD back up to speed), these would not be active. 
> 
> For us using size 4 and min size 2 is so that we can lose an entire rack (2 
> copies) but not block I/O. Our configuration prevents four copies in one 
> rack. If we lose a rack and then an OSD in the surviving rack, write I/O to 
> those placement groups groups will block until the objects have been 
> replicated elsewhere in the rack, but it would not be more than 2 copies.
> 
> I hope I'm making sense and this my jabbering is useful.

Yes, it is helpful, thank you. My clarity level has been upgraded from mud to 
stained glass.

If I am following the logic of your rule correctly:

1. If we have less than 2 replicas per rack, run this step:
step choose firstn 2 type rack
2. If we have less than 2 replicas on our hosts in this rack, run this step:
step chooseleaf firstn 2 type host

I still don't understand where exactly max_size comes into play, unless you 
have some elaborate chain of rules, like mixing platter and ssd drives in the 
same pool. The documented example for this scenario is the only one I have 
found that utilizes the max_size in a meaningful way.

Anyway, thanks for your help in translating from CRUSH to English.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to