FW: erasure code and coefficients

Andreas Joachim Peters Mon, 30 Jun 2014 06:00:07 -0700

Hi Loic, 

I was reading through the code and I have got the impression that they have 
formulated something simple with a lot of mathematical formulas in this paper. 
Indeed all ci = 1 ... which makes it simple ...


The code seems to work  like this  e..g 10 + 2 + 4 :

Data = (1,2,3,4,5,6,7,8,9,10)

(RS1,RS2,RS3,RS4) = RS(1,2,3,4,5,5,6,7,8,9,10)

P1 = 1 ^ 2 ^ 3 ^ 4 ^ 5
P2 = 6 ^ 7 ^ 8 ^ 9 ^ 10
P3 = P1 ^ P2 = RS1 ^ RS2 ^ RS3 ^ RS4 ( no need to store that, since we store P1 
and P2)

You can reconstruct a missing RS chunk with local parity like: RS1 = P1 ^ P2 ^ 
RS2 ^ RS3 ^ RS4

At least this is what I got out of the code ...  look at getSRCGroupNeighbors

What I don't know is, if P1 ^ P2 ^ P3 = 0 work with every encoding matrix or 
only for that particular one used.

Cheers Andreas.



>
> On Mon, Jun 30, 2014 at 12:34 PM, Loic Dachary <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     Hi Andreas,
>
>     TL;DR: which part of the code chooses the coefficients to maximize the 
> fault tolerance of the code as suggested in the Xorbas paper ?
>
>     If I understand correctly, the locality is computed (i.e. how many local 
> parity chunks are created) with:
>
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L84
>
>     This is different from specifying it explicitly 
> (https://github.com/dachary/ceph/commit/91b9cc2be5851f7668d593bb69b2f4f089ea585a#diff-5518964bc98a094a784ce2d17a5b0cc1R11)
>
>     Then a RS encoding matrix is calculated
>
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L100
>
>     and will be used when encoding an object
>
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L143
>
>     then each local parity chunk is calculated using a simpler function (XOR)
>
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L154
>
>     When decoding, the equivalent of the Ceph minimum_to_decode method is 
> used to figure out which chunks are needed to recover the lost chunks
>
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L319
>
>     During recovery, if only one chunk is missing, it is recovered with the 
> simpler function (XOR)
>
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L207
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L224
>     (both of which are combined recursively)
>
>     and if there is is a "conflict" (i.e. if local parity is not enough to 
> recover), then the RS decoding function is used
>
>     
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L266
>
>     I don't understand at all how local parity is related to RS parity, in 
> the sense of "choosing the coefficients c(i) to maximize the fault tolerance 
> of the code" as mentioned in the Xorbas paper : I must be missing something 
> :-) I do understand how the code works though, which is troubling. It also is 
> reassuring because the logic is very similar to what is proposed in 
> https://github.com/ceph/ceph/pull/1921
>
>     Cheers
>
>     On 30/06/2014 11:26, Andreas Joachim Peters wrote:
>     > Hi Loic,
>     >
>     > i think the best is to read along the sources. It is very readable!
>     >
>     > 
> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java
>     >
>     > If there is a high interest in this, you  could port the code from Java 
> to C++ and use the GF-complete or Intel GF-multiplication implementation for 
> the used GF multiplications.
>     >
>     > Cheers Andreas.
>     >
>     >
>     > ________________________________________
>     > From: [email protected] 
> <mailto:[email protected]> [[email protected] 
> <mailto:[email protected]>] on behalf of Loic Dachary 
> [[email protected] <mailto:[email protected]>]
>     > Sent: 30 June 2014 11:06
>     > To: Koleos Fuscus
>     > Cc: Ceph Development
>     > Subject: Re: erasure code and coefficients
>     >
>     > Hi koleosfuscus,
>     >
>     > It clarifies it enough to raise a question : where can I read code (or 
> an algorithm if not code) that chose the coefficients desirable to implement 
> what is suggested in the Xorbas paper ?
>     >
>     > Cheers
>     >
>     > On 30/06/2014 10:18, Koleos Fuscus wrote:
>     >> Hi Loic,
>     >>
>     >> I am happy to contribute with some clarifications. In fact,
>     >> erasure/reliability concepts are not blocking my progress with the
>     >> reliability model at ceph. It is the ceph model itself that has some
>     >> parts not clear to me, and nobody had time yet to review the state
>     >> model diagram that I published on the wiki. :(
>     >>  Anyway, regarding coefficients here is a bit of background.
>     >> Coefficients are the numbers that multiply your variables inside an
>     >> equation. In a toy example, to solve the equation ax^2+bx+c=0 you need
>     >> to find the coefficients a,b,c that make the equation valid.
>     >> In the context of Reed Solomon, the definition of coefficients is a
>     >> bit more confusing. In the original design, the message x is
>     >> interpreted as coefficients of a polynomial p. But in subsequents
>     >> interpretations the message x is seen as the values of the polynomial
>     >> p evaluated at the first k points a1..ak. Such interpretation is
>     >> apparently a bit less efficient but desirable because you can
>     >> construct a systematic code.
>     >> In the context of xorbas, you are constructing a code on top of Reed
>     >> Solomon. The codewords are seen as values, and the idea is to get
>     >> coefficients c1..c10 that also satisfy s1+s2+s3=0 (take this as a
>     >> missing introduction to my previous message)
>     >>
>     >> Cheers,
>     >>
>     >> koleosfuscus
>     >>
>     >> ________________________________________________________________
>     >> "My reply is: the software has no known bugs, therefore it has not
>     >> been updated."
>     >> Wietse Venema
>     >>
>     >>
>     >> On Sun, Jun 29, 2014 at 8:44 PM, Loic Dachary <[email protected] 
> <mailto:[email protected]>> wrote:
>     >>> Hi koleofuscus,
>     >>>
>     >>> Thanks for the explanation : it is very conforting to know that you 
> understand this :-) At the risk of being thick, I must say that the very 
> notion of "coefficient" eludes me. What are they ?
>     >>>
>     >>> Cheers
>     >>>
>     >>> On 29/06/2014 20:38, Koleos Fuscus wrote:
>     >>>> Hello Loic,
>     >>>> Dimakis (one of the authors of xorbas) is talking about coefficients
>     >>>> because they want to find a way to reduce the storage overhead used
>     >>>> with LRC. In the simple case used in Fig. 2, a RS (k=10, m=4) has
>     >>>> 14/10 storage overhead but when using LRC, the overhead increases to
>     >>>> 17/10 because you also need to store s1, s2 and s3. Basically, the
>     >>>> idea is to find specific coefficients c1..c10 that permit to obtain 
> s3
>     >>>> through s1 and s2. In other words, get some s1 and s2 that when xored
>     >>>> together give s3. If you find such coefficients, you don't need to
>     >>>> store s3 and the storage overhead of LRC is 1.6x instead of 1.7x.
>     >>>>
>     >>>> Dimakis said that for the Reed Solomon implementation used in HDFS
>     >>>> RAID they can simple set all coefficients with value '1' and use xor.
>     >>>>
>     >>>> This cannot be the case of the Reed Solomon implemented by you (I
>     >>>> understood is the jerasure library by Plank) but that I am not sure. 
> I
>     >>>> guess we need the help of a mathematician or at least check and
>     >>>> compare both implementations.
>     >>>>
>     >>>> Finally, apparently for xorbas they only implemented the 
> configuration
>     >>>> RS(10,4) and not other combinations. Unfortunately, the wiki page of
>     >>>> the project is empty http://wiki.apache.org/hadoop/ErasureCode and 
> the
>     >>>> main page says 'erasure coding under development'.
>     >>>>
>     >>>> I recommend you to watch the xorbas presentation video
>     >>>> http://smahesh.com/HadoopUSC/ (a very clear explanation of xorbas) 
> and
>     >>>> use the Dimakis wiki page to check the large collection of paper they
>     >>>> have: http://storagewiki.ece.utexas.edu/
>     >>>>
>     >>>> Best,
>     >>>>
>     >>>> koleosfuscus
>     >>>>
>     >>>> ________________________________________________________________
>     >>>> "My reply is: the software has no known bugs, therefore it has not
>     >>>> been updated."
>     >>>> Wietse Venema
>     >>>>
>     >>>>
>     >>>> On Sun, Jun 29, 2014 at 11:30 AM, Loic Dachary <[email protected] 
> <mailto:[email protected]>> wrote:
>     >>>>> Hi Andreas,
>     >>>>>
>     >>>>> In http://anrg.usc.edu/~maheswaran/Xorbas.pdf I get the idea of 
> computing local coding chunks the way it is implemented in 
> https://github.com/ceph/ceph/pull/1921 (i.e. delegating encoding / decoding 
> to other plugins). However, there are theoretical aspects of the paper that I 
> do not understand and I'm hoping you can shed some light on it. In 
> particular, I don't know what "coefficients" are about. For instance in the 
> context of Figure 2 caption : "The main theoretical challenge is to choose 
> the coeffi cients c(i) to maximize the fault tolerance of the code."
>     >>>>>
>     >>>>> Would you recommend a paper to read to better understand this ? 
> Also I'd like to understand what "coefficients" mean in the context of 
> jerasure or if they do not apply.
>     >>>>>
>     >>>>> Thanks for you help :-)
>     >>>>>
>     >>>>> --
>     >>>>> Loïc Dachary, Artisan Logiciel Libre
>     >>>>>
>     >>>
>     >>> --
>     >>> Loïc Dachary, Artisan Logiciel Libre
>     >>>
>     >
>     > --
>     > Loïc Dachary, Artisan Logiciel Libre
>     >
>     > --
>     > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>     > the body of a message to [email protected] 
> <mailto:[email protected]>
>     > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>     >
>
>     --
>     Loïc Dachary, Artisan Logiciel Libre
>
>

--
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

FW: erasure code and coefficients

Reply via email to