Hi, On 30/06/2014 14:59, Andreas Joachim Peters wrote: > Hi Loic, > > I was reading through the code and I have got the impression that they have > formulated something simple with a lot of mathematical formulas in this > paper. Indeed all ci = 1 ... which makes it simple ...
It makes things simpler to understand from my perspective, cool :-) > The code seems to work like this e..g 10 + 2 + 4 : > > Data = (1,2,3,4,5,6,7,8,9,10) > > (RS1,RS2,RS3,RS4) = RS(1,2,3,4,5,5,6,7,8,9,10) > > P1 = 1 ^ 2 ^ 3 ^ 4 ^ 5 > P2 = 6 ^ 7 ^ 8 ^ 9 ^ 10 > P3 = P1 ^ P2 = RS1 ^ RS2 ^ RS3 ^ RS4 ( no need to store that, since we store > P1 and P2) > Using my own words, if local parity P1 and P2 is calculated using only the data chunks, it follows that P1 ^ P2 can also be used as an implied parity chunk for the RS parity chunks. But would that be true if P1 and P2 was calculated using a random subset of the data + coding chunks ? I.e if P1 = RS1 ^ 2 ^ 3 ^ 4 ^ 5 , can P1 ^ P2 = P3 be used to recover any of 1, RS2, RS3, RS4 ? In other words, does it make a difference that P3 is an implied parity for the parity chunks or can it be an implied parity for any data/coding chunks involved in RS ? Cheers > You can reconstruct a missing RS chunk with local parity like: RS1 = P1 ^ P2 > ^ RS2 ^ RS3 ^ RS4 > > At least this is what I got out of the code ... look at getSRCGroupNeighbors > > What I don't know is, if P1 ^ P2 ^ P3 = 0 work with every encoding matrix or > only for that particular one used. > > Cheers Andreas. > > > >> >> On Mon, Jun 30, 2014 at 12:34 PM, Loic Dachary <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Andreas, >> >> TL;DR: which part of the code chooses the coefficients to maximize the >> fault tolerance of the code as suggested in the Xorbas paper ? >> >> If I understand correctly, the locality is computed (i.e. how many local >> parity chunks are created) with: >> >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L84 >> >> This is different from specifying it explicitly >> (https://github.com/dachary/ceph/commit/91b9cc2be5851f7668d593bb69b2f4f089ea585a#diff-5518964bc98a094a784ce2d17a5b0cc1R11) >> >> Then a RS encoding matrix is calculated >> >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L100 >> >> and will be used when encoding an object >> >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L143 >> >> then each local parity chunk is calculated using a simpler function (XOR) >> >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L154 >> >> When decoding, the equivalent of the Ceph minimum_to_decode method is >> used to figure out which chunks are needed to recover the lost chunks >> >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L319 >> >> During recovery, if only one chunk is missing, it is recovered with the >> simpler function (XOR) >> >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L207 >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L224 >> (both of which are combined recursively) >> >> and if there is is a "conflict" (i.e. if local parity is not enough to >> recover), then the RS decoding function is used >> >> >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java#L266 >> >> I don't understand at all how local parity is related to RS parity, in >> the sense of "choosing the coefficients c(i) to maximize the fault tolerance >> of the code" as mentioned in the Xorbas paper : I must be missing something >> :-) I do understand how the code works though, which is troubling. It also >> is reassuring because the logic is very similar to what is proposed in >> https://github.com/ceph/ceph/pull/1921 >> >> Cheers >> >> On 30/06/2014 11:26, Andreas Joachim Peters wrote: >> > Hi Loic, >> > >> > i think the best is to read along the sources. It is very readable! >> > >> > >> https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java >> > >> > If there is a high interest in this, you could port the code from >> Java to C++ and use the GF-complete or Intel GF-multiplication >> implementation for the used GF multiplications. >> > >> > Cheers Andreas. >> > >> > >> > ________________________________________ >> > From: [email protected] >> <mailto:[email protected]> [[email protected] >> <mailto:[email protected]>] on behalf of Loic Dachary >> [[email protected] <mailto:[email protected]>] >> > Sent: 30 June 2014 11:06 >> > To: Koleos Fuscus >> > Cc: Ceph Development >> > Subject: Re: erasure code and coefficients >> > >> > Hi koleosfuscus, >> > >> > It clarifies it enough to raise a question : where can I read code (or >> an algorithm if not code) that chose the coefficients desirable to implement >> what is suggested in the Xorbas paper ? >> > >> > Cheers >> > >> > On 30/06/2014 10:18, Koleos Fuscus wrote: >> >> Hi Loic, >> >> >> >> I am happy to contribute with some clarifications. In fact, >> >> erasure/reliability concepts are not blocking my progress with the >> >> reliability model at ceph. It is the ceph model itself that has some >> >> parts not clear to me, and nobody had time yet to review the state >> >> model diagram that I published on the wiki. :( >> >> Anyway, regarding coefficients here is a bit of background. >> >> Coefficients are the numbers that multiply your variables inside an >> >> equation. In a toy example, to solve the equation ax^2+bx+c=0 you need >> >> to find the coefficients a,b,c that make the equation valid. >> >> In the context of Reed Solomon, the definition of coefficients is a >> >> bit more confusing. In the original design, the message x is >> >> interpreted as coefficients of a polynomial p. But in subsequents >> >> interpretations the message x is seen as the values of the polynomial >> >> p evaluated at the first k points a1..ak. Such interpretation is >> >> apparently a bit less efficient but desirable because you can >> >> construct a systematic code. >> >> In the context of xorbas, you are constructing a code on top of Reed >> >> Solomon. The codewords are seen as values, and the idea is to get >> >> coefficients c1..c10 that also satisfy s1+s2+s3=0 (take this as a >> >> missing introduction to my previous message) >> >> >> >> Cheers, >> >> >> >> koleosfuscus >> >> >> >> ________________________________________________________________ >> >> "My reply is: the software has no known bugs, therefore it has not >> >> been updated." >> >> Wietse Venema >> >> >> >> >> >> On Sun, Jun 29, 2014 at 8:44 PM, Loic Dachary <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Hi koleofuscus, >> >>> >> >>> Thanks for the explanation : it is very conforting to know that you >> understand this :-) At the risk of being thick, I must say that the very >> notion of "coefficient" eludes me. What are they ? >> >>> >> >>> Cheers >> >>> >> >>> On 29/06/2014 20:38, Koleos Fuscus wrote: >> >>>> Hello Loic, >> >>>> Dimakis (one of the authors of xorbas) is talking about coefficients >> >>>> because they want to find a way to reduce the storage overhead used >> >>>> with LRC. In the simple case used in Fig. 2, a RS (k=10, m=4) has >> >>>> 14/10 storage overhead but when using LRC, the overhead increases to >> >>>> 17/10 because you also need to store s1, s2 and s3. Basically, the >> >>>> idea is to find specific coefficients c1..c10 that permit to obtain >> s3 >> >>>> through s1 and s2. In other words, get some s1 and s2 that when >> xored >> >>>> together give s3. If you find such coefficients, you don't need to >> >>>> store s3 and the storage overhead of LRC is 1.6x instead of 1.7x. >> >>>> >> >>>> Dimakis said that for the Reed Solomon implementation used in HDFS >> >>>> RAID they can simple set all coefficients with value '1' and use >> xor. >> >>>> >> >>>> This cannot be the case of the Reed Solomon implemented by you (I >> >>>> understood is the jerasure library by Plank) but that I am not >> sure. I >> >>>> guess we need the help of a mathematician or at least check and >> >>>> compare both implementations. >> >>>> >> >>>> Finally, apparently for xorbas they only implemented the >> configuration >> >>>> RS(10,4) and not other combinations. Unfortunately, the wiki page of >> >>>> the project is empty http://wiki.apache.org/hadoop/ErasureCode and >> the >> >>>> main page says 'erasure coding under development'. >> >>>> >> >>>> I recommend you to watch the xorbas presentation video >> >>>> http://smahesh.com/HadoopUSC/ (a very clear explanation of xorbas) >> and >> >>>> use the Dimakis wiki page to check the large collection of paper >> they >> >>>> have: http://storagewiki.ece.utexas.edu/ >> >>>> >> >>>> Best, >> >>>> >> >>>> koleosfuscus >> >>>> >> >>>> ________________________________________________________________ >> >>>> "My reply is: the software has no known bugs, therefore it has not >> >>>> been updated." >> >>>> Wietse Venema >> >>>> >> >>>> >> >>>> On Sun, Jun 29, 2014 at 11:30 AM, Loic Dachary <[email protected] >> <mailto:[email protected]>> wrote: >> >>>>> Hi Andreas, >> >>>>> >> >>>>> In http://anrg.usc.edu/~maheswaran/Xorbas.pdf I get the idea of >> computing local coding chunks the way it is implemented in >> https://github.com/ceph/ceph/pull/1921 (i.e. delegating encoding / decoding >> to other plugins). However, there are theoretical aspects of the paper that >> I do not understand and I'm hoping you can shed some light on it. In >> particular, I don't know what "coefficients" are about. For instance in the >> context of Figure 2 caption : "The main theoretical challenge is to choose >> the coeffi cients c(i) to maximize the fault tolerance of the code." >> >>>>> >> >>>>> Would you recommend a paper to read to better understand this ? >> Also I'd like to understand what "coefficients" mean in the context of >> jerasure or if they do not apply. >> >>>>> >> >>>>> Thanks for you help :-) >> >>>>> >> >>>>> -- >> >>>>> Loïc Dachary, Artisan Logiciel Libre >> >>>>> >> >>> >> >>> -- >> >>> Loïc Dachary, Artisan Logiciel Libre >> >>> >> > >> > -- >> > Loïc Dachary, Artisan Logiciel Libre >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in >> > the body of a message to [email protected] >> <mailto:[email protected]> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- Loïc Dachary, Artisan Logiciel Libre
signature.asc
Description: OpenPGP digital signature
