Hi Andreas,

Since it looks like we're going to use jerasure-1.2, we will be able to try 
(C)RS using

https://github.com/tsuraan/Jerasure/blob/master/src/cauchy.c
https://github.com/tsuraan/Jerasure/blob/master/src/cauchy.h

Do you know of a better / faster implementation ? Is there a tradeoff between 
(C)RS and RS ?

Cheers

On 06/07/2013 15:43, Andreas-Joachim Peters wrote:
> HI Loic, 
> (C)RS stands for the Cauchy Reed-Solomon codes which are based on pure parity 
> operations, while the standard Reed-Solomon codes need more multiplications 
> and are slower.
> 
> Considering the checksumming ... for comparison the CRC32 code from libz 
> run's on a 8-core Xeon at ~730 MB/s for small block sizes while SSE4.2 CRC32C 
> checksum run's at ~2GByte/s.
> 
> Cheers Andreas.
> 
> 
> 
> 
> On Fri, Jul 5, 2013 at 11:23 PM, Loic Dachary <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     Hi Andreas,
> 
>     On 04/07/2013 23:01, Andreas Joachim Peters wrote:> Hi Loic,
>     > thanks for the responses!
>     >
>     > Maybe this is useful for your erasure code discussion:
>     >
>     > as an example in our RS implementation we chunk a data block of e.g. 4M 
> into 4 data chunks of 1M. Then we create a 2 parity chunks.
>     >
>     > Data & parity chunks are split into 4k blocks and these 4k blocks get a 
> CRC32C block checksum each (SSE4.2 CPU extension => MIT library or BTRFS). 
> This creates 0.1% volume overhead (4 bytes per 4096 bytes) - nothing compared 
> to the parity overhead ...
>     >
>     > You can now easily detect data corruption using the local checksums and 
> avoid to read any parity information and (C)RS decoding if there is no 
> corruption detected. Moreover CRC32C computation is distributed over several 
> (in this case 4) machines while (C)RS decoding would run on a single machine 
> where you assemble a block ... and CRC32C is faster than (C)RS decoding (with 
> SSE4.2) ...
> 
>     What does (C)RS mean ? (C)Reed-Solomon ?
> 
>     > In our case we write this checksum information separate from the 
> original data ... while in a block-based storage like CEPH it would be 
> probably inlined in the data chunk.
>     > If an OSD detects to run on BRTFS or ZFS one could disable 
> automatically the CRC32C code.
> 
>     Nice. I did not know that was built-in :-)
>     
> https://github.com/dachary/ceph/blob/wip-4929/doc/dev/osd_internals/erasure-code.rst#scrubbing
> 
>     > (wouldn't CRC32C be also useful for normal CEPH block replication? )
> 
>     I don't know the details of scrubbing but it seems CRC is already used by 
> deep scrubbing
> 
>     https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2731
> 
>     Cheers
> 
>     > As far as I know with the RS CODEC we use you can either miss stripes 
> (data =0) in the decoding process but you cannot inject corrupted stripes 
> into the decoding process, so the block checksumming is important.
>     >
>     > Cheers Andreas.
> 
>     --
>     Loïc Dachary, Artisan Logiciel Libre
>     All that is necessary for the triumph of evil is that good people do 
> nothing.
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to