Re: CEPH Erasure Encoding + OSD Scalability

Loic Dachary Tue, 10 Dec 2013 00:44:49 -0800

Maybe using

http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html


is enough. fsbench looks overkill indeed.

/me exploring options ;-)

On 09/12/2013 17:45, Loic Dachary wrote:
> Hi,
> 
> Mark Nelson suggested we use perf ( linux-tools ) for benchmarking. It looks 
> like something that would help indeed : the benchmark program would only 
> concern itself with doing some work according to the options and let 
> performances be collected from the outside, using tools that are familiar to 
> people doing benchmarking.
> 
> What do you think ?
> 
> Cheers
> 
> $ perf stat -e
>   Error: switch `e' requires a value
> 
>  usage: perf stat [<options>] [<command>]
> 
>     -e, --event <event>   event selector. use 'perf list' to list available 
> events
>         --filter <filter>
>                           event filter
>     -i, --no-inherit      child tasks do not inherit counters
>     -p, --pid <pid>       stat events on existing process id
>     -t, --tid <tid>       stat events on existing thread id
>     -a, --all-cpus        system-wide collection from all CPUs
>     -g, --group           put the counters into a counter group
>     -c, --scale           scale/normalize counters
>     -v, --verbose         be more verbose (show counter open errors, etc)
>     -r, --repeat <n>      repeat command and print average + stddev (max: 
> 100, forever: 0)
>     -n, --null            null run - dont start any counters
>     -d, --detailed        detailed run - start a lot of events
>     -S, --sync            call sync() before starting a run
>     -B, --big-num         print large numbers with thousands' separators
>     -C, --cpu <cpu>       list of cpus to monitor in system-wide
>     -A, --no-aggr         disable CPU count aggregation
>     -x, --field-separator <separator>
>                           print counts with custom separator
>     -G, --cgroup <name>   monitor event in cgroup name only
>     -o, --output <file>   output file name
>         --append          append to the output file
>         --log-fd <n>      log output to fd, instead of stderr
>         --pre <command>   command to run prior to the measured command
>         --post <command>  command to run after to the measured command
>     -I, --interval-print <n>
>                           print counts at regular interval in ms (>= 100)
>         --per-socket      aggregate counts per processor socket
>         --per-core        aggregate counts per physical processor core
> 
> 
> On 12/11/2013 19:06, Loic Dachary wrote:
>> Hi Andreas,
>>
>> On 12/11/2013 02:11, Andreas Joachim Peters wrote:
>>> Hi Loic,
>>>
>>> I am finally doing the benchmark tool and I found a bunch of wrong 
>>> parameter checks which can make the whole thing SEGV.
>>>
>>> All the RAID-6 codes have restrictions on the parameters but they are not 
>>> correctly enforced for Liberation & Blaum-Roth codes in the CEPH wrapper 
>>> class ... see text from PDF
>>>
>>> "Minimal Density RAID-6 codes are MDS codes based on binary matrices which 
>>> satisfy a lower-bound on the number  of non-zero entries. Unlike Cauchy 
>>> coding, the bit-matrix elements do not correspond to elements in GF (2 w ). 
>>> Instead, the bit-matrix itself has the proper MDS property. Minimal Density 
>>> RAID-6 codes perform faster than Reed-Solomon and Cauchy Reed-Solomon codes 
>>> for the same parameters. Liberation coding, Liber8tion coding, and 
>>> Blaum-Roth coding are three examples of this kind of coding that are 
>>> supported in jerasure.
>>>
>>> With each of these codes, m must be equal to two and k must be less than or 
>>> equal to w. The value of w has restrictions based on the code:
>>>
>>> • With Liberation coding, w must be a prime number [Pla08b].
>>> • With Blaum-Roth coding, w + 1 must be a prime number [BR99]. • With 
>>> Liber8tion coding, w must equal 8 [Pla08a].
>>>
>>> ...
>>>
>>> Do you add this fixes?
>>
>> Nice catch. I created and assigned to myself : 
>> http://tracker.ceph.com/issues/6754
>>>
>>> For the benchmark suite it runs currently 308 different configurations for 
>>> the 2 algorithm which make sense from the performance point of view and 
>>> provides this output:
>>>
>>>
>>> # -----------------------------------------------------------------
>>> # Erasure Coding Benchmark - (C) CERN 2013 - [email protected]
>>> # Ram-Size=12614856704 Allocation-Size=100000000
>>> # -----------------------------------------------------------------
>>> # [ -BENCH- ] [       ] technique=memcpy                                    
>>>                         speed=5.408 [GB/s] latency=9.245 ms
>>> # [ -BENCH- ] [       ] technique=d=a^b^c-xor                               
>>>                         speed=4.377 [GB/s] latency=17.136 ms
>>> # [ -BENCH- ] [001/304] 
>>> technique=cauchy_good:k=05:m=2:w=8:lp=0:packet=00064:size=50000000          
>>> speed=1.308 [GB/s] latency=038  [ms] size-overhead=40   [%]
>>> ..
>>> ..
>>> # [ -BENCH- ] [304/304] 
>>> technique=liberation:k=24:m=2:w=29:lp=2:packet=65536:size=50000000          
>>> speed=0.083 [GB/s] latency=604  [ms] size-overhead=16   [%]
>>> # -----------------------------------------------------------------
>>> # Erasure Code Performance Summary::
>>> # -----------------------------------------------------------------
>>> # RAM:                   12.61 GB
>>> # Allocation-Size        0.10 GB
>>> # -----------------------------------------------------------------
>>> # Byte Initialization:   29.35 MB/s
>>> # Memcpy:                5.41 GB/s
>>> # Triple-XOR:            4.38 GB/s
>>> # -----------------------------------------------------------------
>>> # Fastest RAID6          2.72 GB/s 
>>> liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000
>>> # Fastest Triple Failure 0.96 GB/s 
>>> cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000
>>> # Fastest Quadr. Failure 0.66 GB/s 
>>> cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000
>>> # -----------------------------------------------------------------
>>> # .................................................................
>>> # Top 1  RAID6          2.72 GB/s 
>>> liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000
>>> # Top 2  RAID6          2.72 GB/s 
>>> liber8tion:k=06:m=2:w=8:lp=0:packet=16384:size=50000000
>>> # Top 3  RAID6          2.64 GB/s 
>>> liber8tion:k=06:m=2:w=8:lp=0:packet=65536:size=50000000
>>> # Top 4  RAID6          2.60 GB/s 
>>> liberation:k=07:m=2:w=7:lp=0:packet=16384:size=50000000
>>> # Top 5  RAID6          2.59 GB/s 
>>> liberation:k=05:m=2:w=7:lp=0:packet=04096:size=50000000
>>> # .................................................................
>>> # Top 1  Triple         0.96 GB/s 
>>> cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000
>>> # Top 2  Triple         0.94 GB/s 
>>> cauchy_good:k=06:m=3:w=8:lp=0:packet=16384:size=50000000
>>> # Top 3  Triple         0.93 GB/s 
>>> cauchy_good:k=06:m=3:w=8:lp=0:packet=65536:size=50000000
>>> # Top 4  Triple         0.89 GB/s 
>>> cauchy_good:k=07:m=3:w=8:lp=0:packet=04096:size=50000000
>>> # Top 5  Triple         0.87 GB/s 
>>> cauchy_good:k=05:m=3:w=8:lp=0:packet=04096:size=50000000
>>> # .................................................................
>>> # Top 1  Quadr.         0.66 GB/s 
>>> cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000
>>> # Top 2  Quadr.         0.65 GB/s 
>>> cauchy_good:k=07:m=4:w=8:lp=0:packet=04096:size=50000000
>>> # Top 3  Quadr.         0.64 GB/s 
>>> cauchy_good:k=06:m=4:w=8:lp=0:packet=16384:size=50000000
>>> # Top 4  Quadr.         0.64 GB/s 
>>> cauchy_good:k=05:m=4:w=8:lp=0:packet=04096:size=50000000
>>> # Top 5  Quadr.         0.64 GB/s 
>>> cauchy_good:k=06:m=4:w=8:lp=0:packet=65536:size=50000000
>>> # .................................................................
>>>
>>> It takes around 30 second on my box. 
>>
>>
>> That looks great :-) If I understand correctly, it means 
>> https://github.com/ceph/ceph/pull/740 will no longer have benchmarks as they 
>> are moved to a separate program. Correct ?
>>
>>> I will add a measurement how the XOR and the 3 top algorithms scale with 
>>> the number of cores and make the object-size configurable from the command 
>>> line. Anything else ? 
>>
>> It would be convenient to run this from a "workunit" ( i.e. a script in 
>> ceph/qa/workunits/ ) so that it can later be run by teuthology integration 
>> tests. That could be used to show regression.
>>
>> Shall I add the possiblity to test a single user specified configuration via 
>> command line arguments?
>>>
>> I would need to play with it to comment usefully.
>>
>> Cheers
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

signature.asc
Description: OpenPGP digital signature

Re: CEPH Erasure Encoding + OSD Scalability

Reply via email to