Re: CEPH Erasure Encoding + OSD Scalability

Loic Dachary Wed, 02 Oct 2013 03:16:47 -0700

Hi Andreas,

You should include the Copyright holder. If you are a Cern employee it will 
probably look like this:


Copyright (C) 2013 CERN <[email protected]>
Author: Andreas Joachim Peters <[email protected]>

unless your contract specifies otherwise. If you are not an employee you should 
update to

Copyright (C) 2013 Andreas Joachim Peters <[email protected]>

unless there is a contract (freelance or ...) that specifies otherwise.

Cheers

On 02/10/2013 01:00, Andreas Joachim Peters wrote:
> Hi Loic, 
> 
> here is the patch implementing the basic pyramid code adding local parity to 
> erasure encoding. I tried to keep it 100% identical to the behaviour of the 
> original version besides I changed the alignment to 128-bit words. Atleast 
> your unit tests works ;-)
> 
> https://github.com/apeters1971/ceph/commit/b2de7af1a49dc98940d5685eab00a339bf81a0e5
> 
> in src: 
> 
> make unittest_erasure_code_pyramid_jerasure
> 
> ./unittest_erasure_code_pyramid_jerasure --gtest_filter=*.* 
> --log-to-stderr=true --object-size=64
> 
> It tests (8,2,2)
> 
> [ -TIMING- ] technique=cauchy_good      [           encode ] speed=1.840 
> [GB/s] latency=34.791 ms
> [ -TIMING- ] technique=cauchy_good      [        encode-lp ] speed=1.305 
> [GB/s] latency=49.057 ms
> [ -TIMING- ] technique=cauchy_good      [      encode-lp-3 ] speed=1.307 
> [GB/s] latency=48.956 ms
> [ -TIMING- ] technique=cauchy_good      [ encode-lp-crc32c ] speed=1.036 
> [GB/s] latency=61.752 ms
> [ -TIMING- ] technique=cauchy_good      [             reco ] speed=1.780 
> [GB/s] latency=35.959 ms
> [ -TIMING- ] technique=cauchy_good      [          reco-lp ] speed=4.348 
> [GB/s] latency=14.720 ms
> [ -TIMING- ] technique=cauchy_good      [        reco-lp-3 ] speed=1.256 
> [GB/s] latency=50.962 ms
> [ -TIMING- ] technique=cauchy_good      [   reco-lp-crc32c ] speed=2.300 
> [GB/s] latency=27.832 ms
> [ -TIMING- ] technique=liber8tion       [           encode ] speed=2.297 
> [GB/s] latency=27.865 ms
> [ -TIMING- ] technique=liber8tion       [        encode-lp ] speed=1.498 
> [GB/s] latency=42.731 ms
> [ -TIMING- ] technique=liber8tion       [      encode-lp-3 ] speed=1.505 
> [GB/s] latency=42.513 ms
> [ -TIMING- ] technique=liber8tion       [ encode-lp-crc32c ] speed=1.142 
> [GB/s] latency=56.018 ms
> [ -TIMING- ] technique=liber8tion       [             reco ] speed=2.238 
> [GB/s] latency=28.601 ms
> [ -TIMING- ] technique=liber8tion       [          reco-lp ] speed=4.399 
> [GB/s] latency=14.550 ms
> [ -TIMING- ] technique=liber8tion       [        reco-lp-3 ] speed=1.878 
> [GB/s] latency=34.070 ms
> [ -TIMING- ] technique=liber8tion       [   reco-lp-crc32c ] speed=2.307 
> [GB/s] latency=27.737 ms
> 
> Cheers Andreas.
> 
> 
> ________________________________________
> From: Loic Dachary [[email protected]]
> Sent: 27 September 2013 11:40
> To: Andreas Joachim Peters
> Cc: Ceph Development
> Subject: Re: CEPH Erasure Encoding + OSD Scalability
> 
> On 26/09/2013 23:49, Andreas Joachim Peters wrote:> Sure,
>> this text is clear, but it does not talk about the cost of reconstruction 
>> e.g. not to select a data chunk but a parity chunk costs CPU and increases 
>> latency, but is not reflected by the external cost parameter e.g. if you 
>> have RS (3,2), 3 data and 2 parity chunks with chunks [0,1,2,3,4] with equal 
>> cost values,  I would select [0,1,2] since it avoids computation, however 
>> the retrieval cost for [2,3,4] would be the same but the computational cost 
>> is higher.
> 
> The implementation knows about the computational cost already and is able to 
> figure out that [0,1,2] is going to be cheaper. It does not need input from 
> the caller and the minimum_to_decode method (without the cost)
> https://github.com/ceph/ceph/blob/master/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L45
> does this. If you want to read [0,1,2] and have [0,1,2,3,4] available it will 
> return that you need to retreive [0,1,2] and not [2,3,4] although both would 
> allow to get the content of [0,1,2].
> 
>>
>> Now if [0] has for example the double cost compared to chunk [3], it is not 
>> clear to me if [1,2,3] is a better set than [0,1,2] ... is the meaning of a 
>> higher cost actually more a binary flag saying 'avoid to read this chunk if 
>> possible' ?
>>
>> Could you give a practical example when a chunk can have a higher cost in a 
>> CEPH setup and a rough range for the 'cost' parameter?
> 
> At the moment I can't because it depends on the implementation of the erasure 
> code placement group and it's not complete yet. You are correct : the 
> interpretation of the cost by the plugin cannot be fully described without an 
> intimate knowledge of the implementation. It also means that if the 
> implementation of the caller changes, the semantic of the cost will change an 
> may require a different strategy.
> 
> Cheers
> 
>> Thanks Andreas.
>>
>>
>>
>>
>> ________________________________________
>> From: Loic Dachary [[email protected]]
>> Sent: 26 September 2013 21:18
>> To: Andreas Joachim Peters
>> Cc: Ceph Development
>> Subject: Re: CEPH Erasure Encoding + OSD Scalability
>>
>> [re-adding ceph-devel to the cc]
>>
>> On 26/09/2013 20:36, Andreas-Joachim Peters wrote:> Hi Loic,
>>> today I forked he CEPH repository and will commit my changes to my GitHub 
>>> fork asap ... (I am not familiar with GitHub in particular).
>>> I was finalizing the minimim_to_decode function today with test cases (it 
>>> is more sophisticated in this case ...) ... I didn't fully get what the 
>>> 'with cost' function is supposed to do diffrent from the one without cost?
>>
>> I'd be happy to explain if
>> https://github.com/ceph/ceph/blob/master/src/osd/ErasureCodeInterface.h#L131
>> is unclear. Would you be so kind as to tell me what is confusing in the 
>> description ?
>>
>>>
>>>
>>> Cheers Andreas.
>>>
>>> On Wed, Sep 25, 2013 at 8:48 PM, Loic Dachary <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>
>>>
>>>
>>>     On 25/09/2013 20:33, Andreas Joachim Peters wrote:> Yes, sure. I 
>>> actually thought the same in the meanwhile ...  I have some questions:
>>>     >
>>>     > Q: Can/should it stay in the framework of google test's or you would 
>>> prefer just a plain executable ?
>>>     >
>>>
>>>     A plain executable would make sense. An simple example from 
>>> src/test/Makefile.am :
>>>
>>>     ceph_test_trans_SOURCES = test/test_trans.cc
>>>     ceph_test_trans_LDADD = $(LIBOS) $(CEPH_GLOBAL)
>>>     bin_DEBUGPROGRAMS += ceph_test_trans
>>>
>>>
>>>     > I have added local parity support to your erasure class adding a new 
>>> argument: "erasure-code-lp" and
>>>     > two new methods:
>>>     >
>>>     > localparity_encode(...)
>>>     > localparity_decode(...)
>>>     >
>>>     > I made a more complex benchmark of (8,2) + 2 local parities (1^2^3^4, 
>>> 5^6^7^8) which benchmarks performance of encoding/decoding as speed & 
>>> effective write-latency for three cases (each for liberation & cauchy_good 
>>> codecs):
>>>     >
>>>     > 1 (8,2)
>>>     > 2 (8,2,lp=2)
>>>     > 3 (8,2,lp=2) + crc32c (blocks)
>>>     >
>>>     > and several failure scenarios ... single, double, triple disk 
>>> failures. Probably the best is if I make all this parameters configurable.
>>>
>>>     Great :-) Do you have a public git repository where I could clone this 
>>> & give it a try ?
>>>
>>>     > Q: For the local parity implementation .... shall I inherit from your 
>>> erasure plugin and overwrite the encode/decode method or you would consider 
>>> a patch to the original class?
>>>
>>>     It is a perfect timing for a patch to the original class.
>>>
>>>     > I have also a 128-bit XOR implementation for the local parities. This 
>>> will work with new gcc's & clang compilers ...
>>>     >
>>>     > Q: Which compilers/platforms are supported by CEPH? Is there a 
>>> minimal GCC version?
>>>
>>>     You can see all supported platforms here:
>>>
>>>     http://ceph.com/gitbuilder.cgi
>>>
>>>     I don't think the GCC version shows in the logs but you can probably 
>>> figure it out from the corresponding distribution.
>>>
>>>     > Q: is there some policy restricting comments within code? In general 
>>> I see very few or no comments within the code ..
>>>
>>>     :-) The mon code tends to be more heavily commented than the osd code 
>>> (IMO) but I'm not aware of any policy. When I feel the need to comment, I 
>>> write a unit test. If the unit test is difficult, I tend to comment to 
>>> clarify its purpose. The problem with comments is that they quickly become 
>>> obsolete and/or misleading. That being said, I don't think anyone will 
>>> object if you heavily comment your code.
>>>
>>>     Cheers
>>>
>>>     > Cheers Andreas.
>>>     >
>>>     >
>>>     >
>>>     >
>>>
>>>     --
>>>     Loïc Dachary, Artisan Logiciel Libre
>>>     All that is necessary for the triumph of evil is that good people do 
>>> nothing.
>>>
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> All that is necessary for the triumph of evil is that good people do nothing.
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

signature.asc
Description: OpenPGP digital signature

Re: CEPH Erasure Encoding + OSD Scalability

Reply via email to