I'm finding that I don't really understand how the LRC specification works. Is there a doc somewhere I can read? -Sam
On Mon, Jun 9, 2014 at 1:18 PM, Gregory Farnum <[email protected]> wrote: > On Fri, Jun 6, 2014 at 7:30 AM, Loic Dachary <[email protected]> wrote: >> Hi Andreas, >> >> On 06/06/2014 13:46, Andreas Joachim Peters wrote:> Hi Loic, >>> the basic implementation looks very clean. >>> >>> I have few comments/ideas: >>> >>> - the reconstruction strategy using the three levels is certainly efficient >>> enough for standard cases but does not guarantee always the minimum >>> decoding (in cases where one layer is not enough to reconstruct) since your >>> third algorithm is just brute-force to reconstruct everything through all >>> layers until we have what we need ... >> >> The third strategy is indeed brute force. Do you think it is worth changing >> to be minimal ? It would be nice to quantify the percent of cases it >> addresses. Do you know how to do that ? It looks like a very small >> percentage but there is no proof it is small ;-) >> >>> - the whole LRC configuration actually does not describe the placement - it >>> still looks disconnected from the placement strategy/crush rules ... >>> wouldn't it make sense to have the crush rule implicit in the description >>> or a function to derive it automatically based on the LRC configuration? >>> Maybe you have this already done in another way and I didn't see it ... >> >> Good catch. >> >> What about this: >> >> " [ \"_aAAA_aAA_\", \"set choose datacenter 2\"," >> " \"_aXXX_aXX_\" ]," >> " [ \"b_BBB_____\", \"set choose host 5\"," >> " \"baXXX_____\" ]," >> " [ \"_____cCCC_\", \"\"," >> " \"baXXXcaXX_\" ]," >> " [ \"_____DDDDd\", \"\"," >> " \"baXXXcaXXd\" ]," >> >> Which translates into >> >> take root >> set choose datacenter 2 >> set choose host 5 >> >> In other words, the ruleset is created by concatenating the strings from the >> description, without any kind of smart computation. It is up to the person >> who creates the description to add the ruleset near a description that makes >> sense. There is going to be minimal checking to make sure the ruleset can >> actually be used to get the required number of chunks. >> >> It probably is very difficult and very confusing to automate the generation >> of the ruleset. If it is implicit rather than explicit as above, the >> operator will have to somehow understand and learn how it is computed to >> make sure it does what is desired. With an explicit set of crush rules >> loosely coupled to chunk mapping, the operator can read the crush >> documentation instead of guessing. > > I think I'm missing some context for this discussion (maybe I haven't > been reading other threads closely enough); can you discuss this in > more detail? > Matching up CRUSH rulesets and the EC plugin formulas is very > important and demonstrated to be difficult, but I don't really > understand what you're suggesting here, which makes me think it's not > quite the right idea. ;) > >> >>> - should the plug-in have the ability to select reconstruction on >>> proximity or this should be up-to the higher layer to provide chunks in a >>> way that reconstruction would select the 'closest' layer? The relevance of >>> the question you will understand better in the next point .... >>> >>> - I remember we had this 3 data centre example with (8,4) where you can >>> reconstruct every object if 2 data centres are up. Another appealing >>> example avoiding remote access when reading an object is that you have 2 >>> data centres having a replication of e.g. (4,2) encoded objects. Can you >>> describe in your LRC configuration language to store the same chunk twice >>> like __ABCCBA__ ? >> >> Unless I'm mistaken that would require the caller of the plugin to support >> duplicate data chunks and provide a kind of proximity check. Since this is >> not currently supported by the OSD logic, it is difficult to figure out how >> an erasure code plugin could provide support for this use case. > > I haven't looked at the EC plugin interface at all, but I thought the > OSD told the plugin what chunks it could access, and the plugin tells > it which ones to fetch. So couldn't the plugin simply output duplicate > chunks, and not have the OSD retrieve both of them? > -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
