Re: Locally repairable code description revisited (was Pyramid ...)

Loic Dachary Mon, 09 Jun 2014 14:41:26 -0700

Sam, Greg,

A simpler proposal is documented at :


    
https://github.com/dachary/ceph/commit/ff11902bdc26aa35c70dd2f4d9de31f4cd207519#diff-5518964bc98a094a784ce2d17a5b0cc1R1

which is part of the proposed implementation for locally repairable code

    https://github.com/ceph/ceph/pull/1921

Hopefully it makes sense ;-)

Cheers

On 09/06/2014 22:38, Samuel Just wrote:
> I'm finding that I don't really understand how the LRC specification
> works.  Is there a doc somewhere I can read?
> -Sam
> 
> On Mon, Jun 9, 2014 at 1:18 PM, Gregory Farnum <[email protected]> wrote:
>> On Fri, Jun 6, 2014 at 7:30 AM, Loic Dachary <[email protected]> wrote:
>>> Hi Andreas,
>>>
>>> On 06/06/2014 13:46, Andreas Joachim Peters wrote:> Hi Loic,
>>>> the basic implementation looks very clean.
>>>>
>>>> I have few comments/ideas:
>>>>
>>>> - the reconstruction strategy using the three levels is certainly 
>>>> efficient enough for standard cases but does not guarantee always the 
>>>> minimum decoding (in cases where one layer is not enough to reconstruct) 
>>>> since your third algorithm is just brute-force to reconstruct everything 
>>>> through all layers until we have what we need ...
>>>
>>> The third strategy is indeed brute force. Do you think it is worth changing 
>>> to be minimal ? It would be nice to quantify the percent of cases it 
>>> addresses. Do you know how to do that ? It looks like a very small 
>>> percentage but there is no proof it is small ;-)
>>>
>>>> - the whole LRC configuration actually does not describe the placement - 
>>>> it still looks disconnected from the placement strategy/crush rules ... 
>>>> wouldn't it make sense to have the crush rule implicit in the description 
>>>> or a function to derive it automatically based on the LRC configuration? 
>>>> Maybe you have this already done in another way and I didn't see it ...
>>>
>>> Good catch.
>>>
>>> What about this:
>>>
>>>       "  [ \"_aAAA_aAA_\", \"set choose datacenter 2\","
>>>       "    \"_aXXX_aXX_\" ],"
>>>       "  [ \"b_BBB_____\", \"set choose host 5\","
>>>       "    \"baXXX_____\" ],"
>>>       "  [ \"_____cCCC_\", \"\","
>>>       "    \"baXXXcaXX_\" ],"
>>>       "  [ \"_____DDDDd\", \"\","
>>>       "    \"baXXXcaXXd\" ],"
>>>
>>> Which translates into
>>>
>>> take root
>>> set choose datacenter 2
>>> set choose host 5
>>>
>>> In other words, the ruleset is created by concatenating the strings from 
>>> the description, without any kind of smart computation. It is up to the 
>>> person who creates the description to add the ruleset near a description 
>>> that makes sense. There is going to be minimal checking to make sure the 
>>> ruleset can actually be used to get the required number of chunks.
>>>
>>> It probably is very difficult and very confusing to automate the generation 
>>> of the ruleset. If it is implicit rather than explicit as above, the 
>>> operator will have to somehow understand and learn how it is computed to 
>>> make sure it does what is desired. With an explicit set of crush rules 
>>> loosely coupled to chunk mapping, the operator can read the crush 
>>> documentation instead of guessing.
>>
>> I think I'm missing some context for this discussion (maybe I haven't
>> been reading other threads closely enough); can you discuss this in
>> more detail?
>> Matching up CRUSH rulesets and the EC plugin formulas is very
>> important and demonstrated to be difficult, but I don't really
>> understand what you're suggesting here, which makes me think it's not
>> quite the right idea. ;)
>>
>>>
>>>> -  should the plug-in have the ability to select reconstruction on 
>>>> proximity or this should be up-to the higher layer to provide chunks in a 
>>>> way that reconstruction would select the 'closest' layer? The relevance of 
>>>> the question you will understand better in the next point ....
>>>>
>>>> - I remember we had this 3 data centre example with (8,4) where you can 
>>>> reconstruct every object if 2 data centres are up. Another appealing 
>>>> example avoiding remote access when reading an object is that you have 2 
>>>> data centres having a replication of e.g. (4,2) encoded objects. Can you 
>>>> describe in your LRC configuration language to store the same chunk twice 
>>>> like    __ABCCBA__ ?
>>>
>>> Unless I'm mistaken that would require the caller of the plugin to support 
>>> duplicate data chunks and provide a kind of proximity check. Since this is 
>>> not currently supported by the OSD logic, it is difficult to figure out how 
>>> an erasure code plugin could provide support for this use case.
>>
>> I haven't looked at the EC plugin interface at all, but I thought the
>> OSD told the plugin what chunks it could access, and the plugin tells
>> it which ones to fetch. So couldn't the plugin simply output duplicate
>> chunks, and not have the OSD retrieve both of them?
>> -Greg
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

signature.asc
Description: OpenPGP digital signature

Re: Locally repairable code description revisited (was Pyramid ...)

Reply via email to