I will mention that this is a good tool if you want really detailed
profiling or cpu counter data about what's going on. Other tools that
are more generic (ie ones that just read data from proc, ie collectl,
sar, etc) may also be options.
Mark
On 12/09/2013 10:45 AM, Loic Dachary wrote:
Hi,
Mark Nelson suggested we use perf ( linux-tools ) for benchmarking. It looks
like something that would help indeed : the benchmark program would only
concern itself with doing some work according to the options and let
performances be collected from the outside, using tools that are familiar to
people doing benchmarking.
What do you think ?
Cheers
$ perf stat -e
Error: switch `e' requires a value
usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available
events
--filter <filter>
event filter
-i, --no-inherit child tasks do not inherit counters
-p, --pid <pid> stat events on existing process id
-t, --tid <tid> stat events on existing thread id
-a, --all-cpus system-wide collection from all CPUs
-g, --group put the counters into a counter group
-c, --scale scale/normalize counters
-v, --verbose be more verbose (show counter open errors, etc)
-r, --repeat <n> repeat command and print average + stddev (max: 100,
forever: 0)
-n, --null null run - dont start any counters
-d, --detailed detailed run - start a lot of events
-S, --sync call sync() before starting a run
-B, --big-num print large numbers with thousands' separators
-C, --cpu <cpu> list of cpus to monitor in system-wide
-A, --no-aggr disable CPU count aggregation
-x, --field-separator <separator>
print counts with custom separator
-G, --cgroup <name> monitor event in cgroup name only
-o, --output <file> output file name
--append append to the output file
--log-fd <n> log output to fd, instead of stderr
--pre <command> command to run prior to the measured command
--post <command> command to run after to the measured command
-I, --interval-print <n>
print counts at regular interval in ms (>= 100)
--per-socket aggregate counts per processor socket
--per-core aggregate counts per physical processor core
On 12/11/2013 19:06, Loic Dachary wrote:
Hi Andreas,
On 12/11/2013 02:11, Andreas Joachim Peters wrote:
Hi Loic,
I am finally doing the benchmark tool and I found a bunch of wrong parameter
checks which can make the whole thing SEGV.
All the RAID-6 codes have restrictions on the parameters but they are not correctly
enforced for Liberation & Blaum-Roth codes in the CEPH wrapper class ... see
text from PDF
"Minimal Density RAID-6 codes are MDS codes based on binary matrices which
satisfy a lower-bound on the number of non-zero entries. Unlike Cauchy coding, the
bit-matrix elements do not correspond to elements in GF (2 w ). Instead, the
bit-matrix itself has the proper MDS property. Minimal Density RAID-6 codes perform
faster than Reed-Solomon and Cauchy Reed-Solomon codes for the same parameters.
Liberation coding, Liber8tion coding, and Blaum-Roth coding are three examples of
this kind of coding that are supported in jerasure.
With each of these codes, m must be equal to two and k must be less than or
equal to w. The value of w has restrictions based on the code:
• With Liberation coding, w must be a prime number [Pla08b].
• With Blaum-Roth coding, w + 1 must be a prime number [BR99]. • With
Liber8tion coding, w must equal 8 [Pla08a].
...
Do you add this fixes?
Nice catch. I created and assigned to myself :
http://tracker.ceph.com/issues/6754
For the benchmark suite it runs currently 308 different configurations for the
2 algorithm which make sense from the performance point of view and provides
this output:
# -----------------------------------------------------------------
# Erasure Coding Benchmark - (C) CERN 2013 - [email protected]
# Ram-Size=12614856704 Allocation-Size=100000000
# -----------------------------------------------------------------
# [ -BENCH- ] [ ] technique=memcpy
speed=5.408 [GB/s] latency=9.245 ms
# [ -BENCH- ] [ ] technique=d=a^b^c-xor
speed=4.377 [GB/s] latency=17.136 ms
# [ -BENCH- ] [001/304]
technique=cauchy_good:k=05:m=2:w=8:lp=0:packet=00064:size=50000000
speed=1.308 [GB/s] latency=038 [ms] size-overhead=40 [%]
..
..
# [ -BENCH- ] [304/304]
technique=liberation:k=24:m=2:w=29:lp=2:packet=65536:size=50000000
speed=0.083 [GB/s] latency=604 [ms] size-overhead=16 [%]
# -----------------------------------------------------------------
# Erasure Code Performance Summary::
# -----------------------------------------------------------------
# RAM: 12.61 GB
# Allocation-Size 0.10 GB
# -----------------------------------------------------------------
# Byte Initialization: 29.35 MB/s
# Memcpy: 5.41 GB/s
# Triple-XOR: 4.38 GB/s
# -----------------------------------------------------------------
# Fastest RAID6 2.72 GB/s
liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000
# Fastest Triple Failure 0.96 GB/s
cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000
# Fastest Quadr. Failure 0.66 GB/s
cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000
# -----------------------------------------------------------------
# .................................................................
# Top 1 RAID6 2.72 GB/s
liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000
# Top 2 RAID6 2.72 GB/s
liber8tion:k=06:m=2:w=8:lp=0:packet=16384:size=50000000
# Top 3 RAID6 2.64 GB/s
liber8tion:k=06:m=2:w=8:lp=0:packet=65536:size=50000000
# Top 4 RAID6 2.60 GB/s
liberation:k=07:m=2:w=7:lp=0:packet=16384:size=50000000
# Top 5 RAID6 2.59 GB/s
liberation:k=05:m=2:w=7:lp=0:packet=04096:size=50000000
# .................................................................
# Top 1 Triple 0.96 GB/s
cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000
# Top 2 Triple 0.94 GB/s
cauchy_good:k=06:m=3:w=8:lp=0:packet=16384:size=50000000
# Top 3 Triple 0.93 GB/s
cauchy_good:k=06:m=3:w=8:lp=0:packet=65536:size=50000000
# Top 4 Triple 0.89 GB/s
cauchy_good:k=07:m=3:w=8:lp=0:packet=04096:size=50000000
# Top 5 Triple 0.87 GB/s
cauchy_good:k=05:m=3:w=8:lp=0:packet=04096:size=50000000
# .................................................................
# Top 1 Quadr. 0.66 GB/s
cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000
# Top 2 Quadr. 0.65 GB/s
cauchy_good:k=07:m=4:w=8:lp=0:packet=04096:size=50000000
# Top 3 Quadr. 0.64 GB/s
cauchy_good:k=06:m=4:w=8:lp=0:packet=16384:size=50000000
# Top 4 Quadr. 0.64 GB/s
cauchy_good:k=05:m=4:w=8:lp=0:packet=04096:size=50000000
# Top 5 Quadr. 0.64 GB/s
cauchy_good:k=06:m=4:w=8:lp=0:packet=65536:size=50000000
# .................................................................
It takes around 30 second on my box.
That looks great :-) If I understand correctly, it means
https://github.com/ceph/ceph/pull/740 will no longer have benchmarks as they
are moved to a separate program. Correct ?
I will add a measurement how the XOR and the 3 top algorithms scale with the
number of cores and make the object-size configurable from the command line.
Anything else ?
It would be convenient to run this from a "workunit" ( i.e. a script in
ceph/qa/workunits/ ) so that it can later be run by teuthology integration tests. That
could be used to show regression.
Shall I add the possiblity to test a single user specified configuration via
command line arguments?
I would need to play with it to comment usefully.
Cheers
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html