Re: CEPH Erasure Encoding + OSD Scalability

Loic Dachary Sat, 21 Sep 2013 08:12:07 -0700

Hi Andreas,

It's probably too soon to be smart about reducing the number of copies, but 
you're right : this copy is not necessary. The following pull request gets rid 
of it:


https://github.com/ceph/ceph/pull/615

Cheers

On 20/09/2013 18:49, Loic Dachary wrote:
> Hi,
> 
> This is a first attempt at avoiding unnecessary copy:
> 
> https://github.com/dachary/ceph/blob/03445a5926cd073c11cd8693fb110729e40f35fa/src/osd/ErasureCodePluginJerasure/ErasureCodeJerasure.cc#L66
> 
> I'm not sure how it could be made more readable / terse with bufferlist 
> iterators. Any kind of hint would be welcome :-)
> 
> Cheers
> 
> On 20/09/2013 17:36, Sage Weil wrote:
>> On Fri, 20 Sep 2013, Loic Dachary wrote:
>>> Hi Andreas,
>>>
>>> Great work on these benchmarks ! It's definitely an incentive to improve as 
>>> much as possible. Could you push / send the scripts and sequence of 
>>> operations you've used ? I'll reproduce this locally while getting rid of 
>>> the extra copy. It would be useful to capture that into a script that can 
>>> be conveniently run from the teuthology integrations tests to check against 
>>> performance regressions.
>>>
>>> Regarding the 3P implementation, in my opinion it would be very valuable 
>>> for some people who prefer low CPU consumption. And I'm eager to see more 
>>> than one plugin in the erasure code plugin directory ;-)
>>
>> One way to approach this might be to make a bufferlist 'multi-iterator' 
>> that you give you bufferlist::iterator's and will give you back a pair of 
>> points and length for each contiguous segment.  This would capture the 
>> annoying iterator details and let the user focus on processing chunks that 
>> are as large as possible.
>>
>> sage
>>
>>
>>  > 
>>> Cheers
>>>
>>> On 20/09/2013 13:35, Andreas Joachim Peters wrote:
>>>> Hi Loic, 
>>>>
>>>> I have now some benchmarks on a Xeon 2.27 GHz 4-core with gcc 4.4 (-O2) 
>>>> for ENCODING based on the CEPH Jerasure port.
>>>> I measured for objects from 128k to 512 MB with random contents (if you 
>>>> encode 1 GB objects you see slow downs due to caching inefficiencies ...), 
>>>> otherwise results are stable for the given object sizes.
>>>>
>>>> I quote only the benchmark for ErasureCodeJerasureReedSolomonRAID6 (3,2) , 
>>>> the other are significantly slower (2-3x slower) and my 3P(3,2,1) 
>>>> implementation providing the same redundancy level like RS-Raid6[3,2] 
>>>> (double disk failure) but using more space (66% vs 100% overhead).
>>>>
>>>> The effect of out.c_str() is significant ( contributes with factor 2 
>>>> slow-down for the best jerasure algorithm for [3,2] ).
>>>>
>>>> Averaged results for Objects Size 4MB:
>>>>
>>>> 1) Erasure CRS [3,2] - 2.6 ms buffer preparation (out.c_str()) - 2.4 ms 
>>>> encoding => ~780 MB/s
>>>> 2) 3P [3,2,1] - 0,005 ms buffer preparation (3P adjusts the padding in the 
>>>> algorithm) - 0.87ms encoding => ~4.4 GB/s
>>>>
>>>> I think it pays off to avoid the copy in the encoding if it does not 
>>>> matter for the buffer handling upstream and pad only the last chunk.
>>>>
>>>> Last thing I tested is how performances scales with number of cores 
>>>> running 4 tests in parallel:
>>>>
>>>> Jerasure (3,2) limits at ~2,0 GB/s for a 4-core CPU (Xeon 2.27 GHz).
>>>> 3P(3,2,1) limits ~8 GB/s for a 4-core CPU (Xeon 2.27 GHz).
>>>>
>>>> I also implemented the decoding for 3P, but didn't test yet all 
>>>> reconstruction cases. There is probably room for improvements using AVX 
>>>> support for XOR operations in both implementations.
>>>>
>>>> Before I invest more time, do think it is useful to have this fast 3P 
>>>> algorithm for double disk failures with 100% space overhead? Because I 
>>>> believe that people will always optimize for space and would rather use 
>>>> something like (10,2) even if the performance degrades and CPU consumption 
>>>> goes up?!? Let me know, no problem in any case!
>>>>
>>>> Finally I tested some combinations for ErasureCodeJerasureReedSolomonRAID6:
>>>>
>>>> (3,2) (4,2) (6,2) (8,2) (10,2) they all run around 780-800 MB/s
>>>>
>>>> Cheers Andreas.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> Lo?c Dachary, Artisan Logiciel Libre
>>> All that is necessary for the triumph of evil is that good people do 
>>> nothing.
>>>
>>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

signature.asc
Description: OpenPGP digital signature

Re: CEPH Erasure Encoding + OSD Scalability

Reply via email to