Re: [ceph-users] Problem with customized crush rule for EC pool

Loic Dachary Tue, 09 Sep 2014 07:03:18 -0700


On 09/09/2014 14:21, Lei Dong wrote:
> Thanks loic!
> 
> Actually I've found that increase choose_local_fallback_tries can 
> help(chooseleaf_tries helps not so significantly), but I'm afraid when osd 
> failure happen and need to find new acting set, it may be fail to find enough 
> racks again. So I'm trying to find a more guaranteed way in case of osd 
> failure.
> 
> My profile is nothing special other than k=8 m=3.


So your goal is to make it so loosing 3 OSD simultaneously does not mean 
loosing data. By forcing each rack to hold at most 2 OSDs for a given object, 
you make it so loosing a full rack does not mean loosing data. Are these racks 
in the same room in the datacenter ? In the event of a catastrophic failure 
that permanently destroy one rack, how realistic is it that the other racks are 
unharmed ? If the rack is destroyed by fire and is in a row with the six other 
racks, there is a very high chance that the other racks will also be damaged. 
Note that I am not a system architect nor a system administrator : I may be 
completely wrong ;-) If it turns out that the probability of a single rack to 
fail entirely and independently of the others is negligible, it may not be 
necessary to make a complex ruleset and instead use the default ruleset.

My 2cts
 
> 
> Thanks again!
> 
> Leidong
> 
> 
> 
> 
> 
>> On 2014年9月9日, at 下午7:53, "Loic Dachary" <[email protected]> wrote:
>>
>> Hi,
>>
>> It is indeed possible that mapping fails if there are just enough racks to 
>> match the constraint. And the probability of a bad mapping increases when 
>> the number of PG increases because there is a need for more mapping. You can 
>> tell crush to try harder with 
>>
>> step set_chooseleaf_tries 10
>>
>> Be careful though : increasing this number will change mapping. It will not 
>> just fix the bad mappings you're seeing, it will also change the mappings 
>> that succeeded with a lower value. Once you've set this parameter, it cannot 
>> be modified.
>>
>> Would you mind sharing the erasure code profile you plan to work with ?
>>
>> Cheers
>>
>>> On 09/09/2014 12:39, Lei Dong wrote:
>>> Hi ceph users:
>>>
>>> I want to create a customized crush rule for my EC pool (with replica_size 
>>> = 11) to distribute replicas into 6 different Racks. 
>>>
>>> I use the following rule at first:
>>>
>>> Step take default  // root
>>> Step choose firstn 6 type rack// 6 racks, I have and only have 6 racks
>>> Step chooseleaf indep 2 type osd // 2 osds per rack 
>>> Step emit
>>>
>>> I looks fine and works fine when PG num is small. 
>>> But when pg num increase, there are always some PGs which can not take all 
>>> the 6 racks. 
>>> It looks like “Step choose firstn 6 type rack” sometimes returns only 5 
>>> racks.
>>> After some investigation,  I think it may caused by collision of choices.
>>>
>>> Then I come up with another solution to solve collision like this:
>>>
>>> Step take rack0
>>> Step chooseleaf indep 2 type osd
>>> Step emit
>>> Step take rack1
>>> ….
>>> (manually take every rack)
>>>
>>> This won’t cause rack collision, because I specify rack by name at first. 
>>> But the problem is that osd in rack0 will always be primary osd because I 
>>> choose from rack0 first.
>>>
>>> So the question is what is the recommended way to meet such a need 
>>> (distribute 11 replicas into 6 racks evenly in case of rack failure)?
>>>
>>>
>>> Thanks!
>>> LeiDong
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre

signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problem with customized crush rule for EC pool

Reply via email to