Re: Pyramid erasure code description revisited

Loic Dachary Mon, 02 Jun 2014 11:50:24 -0700

Hi koleosfuscus,

A simpler proposal was made a few days ago. As you rightfully point out, the 
previous one was a bit complicated to understand ;-)


http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/19753

Cheers

On 02/06/2014 20:28, Koleos Fuskus wrote:
> Hi Loic,
> I am trying to understand your proposal on 
> http://pad.ceph.com/p/cdsgiant-pyramid-erasure-code
> Is the mapping specification a new feature on CRUSH to support Pyramid Codes? 
> I don't follow from line 72, when you are talking about "crush 
> multidatacenter mapping". 
> Adding a failure domain typically adds a new level in the pyramid using xor?
> 
> "** if one chunk is missing, will return that all chunks from the local 
> cluster are needed"
> If one chunk is missing, it recovers it using xor instead of jerasure?
> "** if two chunks are missing in the same local cluster, it will defer to the 
> global level"
> In this case it has the pyramid code doesn't help, does it?
> ** if two chunks are missing, each of them in a different local cluster, it 
> will return that it needs all chunks from both local cluster but will not 
> defer to the upper level
> 
> Best,
> koleos
> 
> 
> 
> On Monday, June 2, 2014 3:14 PM, Loic Dachary <[email protected]> wrote:
> Hi Andreas,
> 
> On 02/06/2014 14:20, Andreas Joachim Peters wrote:> Hi Loic, 
>>
>> I think this gives all the flexibility to define any possible combination 
>> for encoding ...
>>
>> When one constructs the steps one has just to be aware that the 'most local' 
>> encoding should happen in the end, right?
> 
> Yes. 
> 
>>
>> It would be usefule to have a tool which outputs then for each data aND 
>> parity chunk the achieved 'redundancy' and the overall volume and maximal 
>> reconstruction 'overhead'.
> 
> Right. I'm kind of hoping koleosfuscus (cc'ed) will be able to fit that into 
> the reliability model, but we've not discussed that yet. In any case you are 
> right, a small command line tool would be helpful. Something that would 
> explain: if you loose one of the chunks you need four to recover. If you lose 
> two you need all of them. That's more humanly readable and understandable 
> than the full description ;-)
> 
> Cheers
> 
>>
>> Cheers Andreas.
>>
>> ________________________________________
>> From: Loic Dachary [[email protected]]
>> Sent: 31 May 2014 19:10
>> To: Andreas Joachim Peters
>> Cc: Ceph Development
>> Subject: Pyramid erasure code description revisited
>>
>> Hi Andreas,
>>
>> After a few weeks and a fresh eye, I revisited the way pyramid erasure code 
>> could be described by the system administrator. Here is a proposal that is 
>> hopefully more intuitive than the one from the last CDS ( 
>> http://pad.ceph.com/p/cdsgiant-pyramid-erasure-code ).
>>
>> These are the steps to create all coding chunks. The upper case letters are 
>> data chunks and the lower case letters are coding chunks.
>>
>> "__ABC__DE_" data chunks placement
>>
>> Step 1
>> "__ABC__DE_"
>> "_yVWX_zYZ_" K=5, M=2
>> "_aABC_bDE_"
>>
>> Step 2
>> "_aABC_bDE_"
>> "z_XYZ_____" K=3, M=1
>> "caABC_bDE_"
>>
>> Step 3
>> "caABC_bDE_"
>> "_____zXYZ_" K=3, M=1
>> "caABCdbDE_"
>>
>> Step 4
>> "caABCdbDE_"
>> "_____WXYZz" K=4, M=1
>> "caABCdbDEe"
>>
>> The interpretation of Step 3 is as follows:
>>
>> Given the output of the previous step ( "caABC_bDE_" ), the bDE chunks are 
>> considered to be data chunks at this stage and they are marked with XYZ. A 
>> K=3, M=1 coding chunk is calculated and placed in the chunk marked with z ( 
>> "_____zXYZ_" ). The output of this coding step is the previous step plus the 
>> coding chunk that was just calculated, named d ( "caABCdbDE_" ).
>>
>> This gives the flexibility of deciding wether or not a coding chunk from a 
>> previous step is used as data to compute the coding chunk of the next step. 
>> It also allows for unbalanced steps such as step 4.
>>
>> For decoding, the steps are walked from the bottom up. If E is missing, it 
>> can be reconstructed from dbD.e in step 4 and the other steps are skipped 
>> because it was the only missing chunk. If AB are missing, all steps that 
>> have not be used to encode it are ignored, up to step 2 that will fail to 
>> recover them because M=1 and yeild to step 1 that will use a..CbDE 
>> successfully because M=2.
>>
>> Giving up the recursion and favor iteration seems to simplify how it can be 
>> explained. And I suspect the implementation is also simpler. What do you 
>> think ?
>>
>> Cheers
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

signature.asc
Description: OpenPGP digital signature

Re: Pyramid erasure code description revisited

Reply via email to