On 08/07/2013 12:00, Andreas Joachim Peters wrote:
> Hi Loic,
> 
> I did the two mentioned benchmarks:
> 
> QFS (m+3) code run's at 300 MB/s ... not worthy (jerasure 390 MB/s).
> 
> I made a quick (3+2) encoding benchmark and this encodes ~ 3 GB/s.
> 

Hi Andreas,

It looks like the simplest and fastest implementation there is :-) I understand 
it only addresses 3+2 but it would make for a fine default implementation / 
example for the erasure coding plugin implementing the proposed API 
https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#erasure-code-library-abstract-api

Cheers

> Cheers Andreas.
> 
> ________________________________________
> From: Sage Weil [[email protected]]
> Sent: 08 July 2013 05:37
> To: Andreas Joachim Peters
> Cc: Loic Dachary; [email protected]
> Subject: RE: CEPH Erasure Encoding + OSD Scalability
> 
> On Sun, 7 Jul 2013, Andreas Joachim Peters wrote:
>> Considering the crc32c-intel code you added ... I would provide a
>> function which provides a crc32c checksum and detects if it can do it
>> using SSE4.2 or implements just the standard algorithm e.g if you run in
>> a virtual machine you need this emulation ...
> 
> The current code in master will do this detection by checking the cpu
> features; see
> 
>         https://github.com/ceph/ceph/blob/master/src/common/crc32c-intel.c#L74
> 
> If there is a better way to do this, I'd love to hear about it.  gcc 4.8
> just added a bunch of built-in functions to do this stuff cleanly, but
> it'll be quite a while before all of our build targets are on 4.8 or
> later.
> 
> sage
> 
> 
>>
>> Cheers Andreas.
>> ________________________________________
>> From: Loic Dachary [[email protected]]
>> Sent: 06 July 2013 22:47
>> To: Andreas Joachim Peters
>> Cc: [email protected]
>> Subject: Re: CEPH Erasure Encoding + OSD Scalability
>>
>> Hi Andreas,
>>
>> Since it looks like we're going to use jerasure-1.2, we will be able to try 
>> (C)RS using
>>
>> https://github.com/tsuraan/Jerasure/blob/master/src/cauchy.c
>> https://github.com/tsuraan/Jerasure/blob/master/src/cauchy.h
>>
>> Do you know of a better / faster implementation ? Is there a tradeoff 
>> between (C)RS and RS ?
>>
>> Cheers
>>
>> On 06/07/2013 15:43, Andreas-Joachim Peters wrote:
>>> HI Loic,
>>> (C)RS stands for the Cauchy Reed-Solomon codes which are based on pure 
>>> parity operations, while the standard Reed-Solomon codes need more 
>>> multiplications and are slower.
>>>
>>> Considering the checksumming ... for comparison the CRC32 code from libz 
>>> run's on a 8-core Xeon at ~730 MB/s for small block sizes while SSE4.2 
>>> CRC32C checksum run's at ~2GByte/s.
>>>
>>> Cheers Andreas.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 5, 2013 at 11:23 PM, Loic Dachary <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>
>>>     Hi Andreas,
>>>
>>>     On 04/07/2013 23:01, Andreas Joachim Peters wrote:> Hi Loic,
>>>     > thanks for the responses!
>>>     >
>>>     > Maybe this is useful for your erasure code discussion:
>>>     >
>>>     > as an example in our RS implementation we chunk a data block of e.g. 
>>> 4M into 4 data chunks of 1M. Then we create a 2 parity chunks.
>>>     >
>>>     > Data & parity chunks are split into 4k blocks and these 4k blocks get 
>>> a CRC32C block checksum each (SSE4.2 CPU extension => MIT library or 
>>> BTRFS). This creates 0.1% volume overhead (4 bytes per 4096 bytes) - 
>>> nothing compared to the parity overhead ...
>>>     >
>>>     > You can now easily detect data corruption using the local checksums 
>>> and avoid to read any parity information and (C)RS decoding if there is no 
>>> corruption detected. Moreover CRC32C computation is distributed over 
>>> several (in this case 4) machines while (C)RS decoding would run on a 
>>> single machine where you assemble a block ... and CRC32C is faster than 
>>> (C)RS decoding (with SSE4.2) ...
>>>
>>>     What does (C)RS mean ? (C)Reed-Solomon ?
>>>
>>>     > In our case we write this checksum information separate from the 
>>> original data ... while in a block-based storage like CEPH it would be 
>>> probably inlined in the data chunk.
>>>     > If an OSD detects to run on BRTFS or ZFS one could disable 
>>> automatically the CRC32C code.
>>>
>>>     Nice. I did not know that was built-in :-)
>>>     
>>> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#scrubbing
>>>
>>>     > (wouldn't CRC32C be also useful for normal CEPH block replication? )
>>>
>>>     I don't know the details of scrubbing but it seems CRC is already used 
>>> by deep scrubbing
>>>
>>>     https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2731
>>>
>>>     Cheers
>>>
>>>     > As far as I know with the RS CODEC we use you can either miss stripes 
>>> (data =0) in the decoding process but you cannot inject corrupted stripes 
>>> into the decoding process, so the block checksumming is important.
>>>     >
>>>     > Cheers Andreas.
>>>
>>>     --
>>>     Lo?c Dachary, Artisan Logiciel Libre
>>>     All that is necessary for the triumph of evil is that good people do 
>>> nothing.
>>>
>>>
>>
>> --
>> Lo?c Dachary, Artisan Logiciel Libre
>> All that is necessary for the triumph of evil is that good people do nothing.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to