Re: CEPH Erasure Encoding + OSD Scalability

Mark Nelson Mon, 09 Dec 2013 09:09:43 -0800

I will mention that this is a good tool if you want really detailedprofiling or cpu counter data about what's going on. Other tools thatare more generic (ie ones that just read data from proc, ie collectl,sar, etc) may also be options.


Mark


On 12/09/2013 10:45 AM, Loic Dachary wrote:

Hi,

Mark Nelson suggested we use perf ( linux-tools ) for benchmarking. It looks 
like something that would help indeed : the benchmark program would only 
concern itself with doing some work according to the options and let 
performances be collected from the outside, using tools that are familiar to 
people doing benchmarking.

What do you think ?

Cheers

$ perf stat -e
   Error: switch `e' requires a value

  usage: perf stat [<options>] [<command>]

     -e, --event <event>   event selector. use 'perf list' to list available 
events
         --filter <filter>
                           event filter
     -i, --no-inherit      child tasks do not inherit counters
     -p, --pid <pid>       stat events on existing process id
     -t, --tid <tid>       stat events on existing thread id
     -a, --all-cpus        system-wide collection from all CPUs
     -g, --group           put the counters into a counter group
     -c, --scale           scale/normalize counters
     -v, --verbose         be more verbose (show counter open errors, etc)
     -r, --repeat <n>      repeat command and print average + stddev (max: 100, 
forever: 0)
     -n, --null            null run - dont start any counters
     -d, --detailed        detailed run - start a lot of events
     -S, --sync            call sync() before starting a run
     -B, --big-num         print large numbers with thousands' separators
     -C, --cpu <cpu>       list of cpus to monitor in system-wide
     -A, --no-aggr         disable CPU count aggregation
     -x, --field-separator <separator>
                           print counts with custom separator
     -G, --cgroup <name>   monitor event in cgroup name only
     -o, --output <file>   output file name
         --append          append to the output file
         --log-fd <n>      log output to fd, instead of stderr
         --pre <command>   command to run prior to the measured command
         --post <command>  command to run after to the measured command
     -I, --interval-print <n>
                           print counts at regular interval in ms (>= 100)
         --per-socket      aggregate counts per processor socket
         --per-core        aggregate counts per physical processor core


On 12/11/2013 19:06, Loic Dachary wrote:

Hi Andreas,

On 12/11/2013 02:11, Andreas Joachim Peters wrote:

Hi Loic,

I am finally doing the benchmark tool and I found a bunch of wrong parameter 
checks which can make the whole thing SEGV.

All the RAID-6 codes have restrictions on the parameters but they are not correctly 
enforced for Liberation & Blaum-Roth codes in the CEPH wrapper class ... see 
text from PDF

"Minimal Density RAID-6 codes are MDS codes based on binary matrices which 
satisfy a lower-bound on the number  of non-zero entries. Unlike Cauchy coding, the 
bit-matrix elements do not correspond to elements in GF (2 w ). Instead, the 
bit-matrix itself has the proper MDS property. Minimal Density RAID-6 codes perform 
faster than Reed-Solomon and Cauchy Reed-Solomon codes for the same parameters. 
Liberation coding, Liber8tion coding, and Blaum-Roth coding are three examples of 
this kind of coding that are supported in jerasure.

With each of these codes, m must be equal to two and k must be less than or 
equal to w. The value of w has restrictions based on the code:

• With Liberation coding, w must be a prime number [Pla08b].
• With Blaum-Roth coding, w + 1 must be a prime number [BR99]. • With 
Liber8tion coding, w must equal 8 [Pla08a].

...

Do you add this fixes?


Nice catch. I created and assigned to myself : 
http://tracker.ceph.com/issues/6754


For the benchmark suite it runs currently 308 different configurations for the 
2 algorithm which make sense from the performance point of view and provides 
this output:


# -----------------------------------------------------------------
# Erasure Coding Benchmark - (C) CERN 2013 - [email protected]
# Ram-Size=12614856704 Allocation-Size=100000000
# -----------------------------------------------------------------
# [ -BENCH- ] [       ] technique=memcpy                                        
                    speed=5.408 [GB/s] latency=9.245 ms
# [ -BENCH- ] [       ] technique=d=a^b^c-xor                                   
                    speed=4.377 [GB/s] latency=17.136 ms
# [ -BENCH- ] [001/304] 
technique=cauchy_good:k=05:m=2:w=8:lp=0:packet=00064:size=50000000          
speed=1.308 [GB/s] latency=038      [ms] size-overhead=40   [%]
..
..
# [ -BENCH- ] [304/304] 
technique=liberation:k=24:m=2:w=29:lp=2:packet=65536:size=50000000          
speed=0.083 [GB/s] latency=604      [ms] size-overhead=16   [%]
# -----------------------------------------------------------------
# Erasure Code Performance Summary::
# -----------------------------------------------------------------
# RAM:                   12.61 GB
# Allocation-Size        0.10 GB
# -----------------------------------------------------------------
# Byte Initialization:   29.35 MB/s
# Memcpy:                5.41 GB/s
# Triple-XOR:            4.38 GB/s
# -----------------------------------------------------------------
# Fastest RAID6          2.72 GB/s 
liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000
# Fastest Triple Failure 0.96 GB/s 
cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000
# Fastest Quadr. Failure 0.66 GB/s 
cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000
# -----------------------------------------------------------------
# .................................................................
# Top 1  RAID6          2.72 GB/s 
liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000
# Top 2  RAID6          2.72 GB/s 
liber8tion:k=06:m=2:w=8:lp=0:packet=16384:size=50000000
# Top 3  RAID6          2.64 GB/s 
liber8tion:k=06:m=2:w=8:lp=0:packet=65536:size=50000000
# Top 4  RAID6          2.60 GB/s 
liberation:k=07:m=2:w=7:lp=0:packet=16384:size=50000000
# Top 5  RAID6          2.59 GB/s 
liberation:k=05:m=2:w=7:lp=0:packet=04096:size=50000000
# .................................................................
# Top 1  Triple         0.96 GB/s 
cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000
# Top 2  Triple         0.94 GB/s 
cauchy_good:k=06:m=3:w=8:lp=0:packet=16384:size=50000000
# Top 3  Triple         0.93 GB/s 
cauchy_good:k=06:m=3:w=8:lp=0:packet=65536:size=50000000
# Top 4  Triple         0.89 GB/s 
cauchy_good:k=07:m=3:w=8:lp=0:packet=04096:size=50000000
# Top 5  Triple         0.87 GB/s 
cauchy_good:k=05:m=3:w=8:lp=0:packet=04096:size=50000000
# .................................................................
# Top 1  Quadr.         0.66 GB/s 
cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000
# Top 2  Quadr.         0.65 GB/s 
cauchy_good:k=07:m=4:w=8:lp=0:packet=04096:size=50000000
# Top 3  Quadr.         0.64 GB/s 
cauchy_good:k=06:m=4:w=8:lp=0:packet=16384:size=50000000
# Top 4  Quadr.         0.64 GB/s 
cauchy_good:k=05:m=4:w=8:lp=0:packet=04096:size=50000000
# Top 5  Quadr.         0.64 GB/s 
cauchy_good:k=06:m=4:w=8:lp=0:packet=65536:size=50000000
# .................................................................

It takes around 30 second on my box.



That looks great :-) If I understand correctly, it means 
https://github.com/ceph/ceph/pull/740 will no longer have benchmarks as they 
are moved to a separate program. Correct ?

I will add a measurement how the XOR and the 3 top algorithms scale with the 
number of cores and make the object-size configurable from the command line. 
Anything else ?


It would be convenient to run this from a "workunit" ( i.e. a script in 
ceph/qa/workunits/ ) so that it can later be run by teuthology integration tests. That 
could be used to show regression.

Shall I add the possiblity to test a single user specified configuration via 
command line arguments?

I would need to play with it to comment usefully.

Cheers


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CEPH Erasure Encoding + OSD Scalability

Reply via email to