Maybe using http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html
is enough. fsbench looks overkill indeed. /me exploring options ;-) On 09/12/2013 17:45, Loic Dachary wrote: > Hi, > > Mark Nelson suggested we use perf ( linux-tools ) for benchmarking. It looks > like something that would help indeed : the benchmark program would only > concern itself with doing some work according to the options and let > performances be collected from the outside, using tools that are familiar to > people doing benchmarking. > > What do you think ? > > Cheers > > $ perf stat -e > Error: switch `e' requires a value > > usage: perf stat [<options>] [<command>] > > -e, --event <event> event selector. use 'perf list' to list available > events > --filter <filter> > event filter > -i, --no-inherit child tasks do not inherit counters > -p, --pid <pid> stat events on existing process id > -t, --tid <tid> stat events on existing thread id > -a, --all-cpus system-wide collection from all CPUs > -g, --group put the counters into a counter group > -c, --scale scale/normalize counters > -v, --verbose be more verbose (show counter open errors, etc) > -r, --repeat <n> repeat command and print average + stddev (max: > 100, forever: 0) > -n, --null null run - dont start any counters > -d, --detailed detailed run - start a lot of events > -S, --sync call sync() before starting a run > -B, --big-num print large numbers with thousands' separators > -C, --cpu <cpu> list of cpus to monitor in system-wide > -A, --no-aggr disable CPU count aggregation > -x, --field-separator <separator> > print counts with custom separator > -G, --cgroup <name> monitor event in cgroup name only > -o, --output <file> output file name > --append append to the output file > --log-fd <n> log output to fd, instead of stderr > --pre <command> command to run prior to the measured command > --post <command> command to run after to the measured command > -I, --interval-print <n> > print counts at regular interval in ms (>= 100) > --per-socket aggregate counts per processor socket > --per-core aggregate counts per physical processor core > > > On 12/11/2013 19:06, Loic Dachary wrote: >> Hi Andreas, >> >> On 12/11/2013 02:11, Andreas Joachim Peters wrote: >>> Hi Loic, >>> >>> I am finally doing the benchmark tool and I found a bunch of wrong >>> parameter checks which can make the whole thing SEGV. >>> >>> All the RAID-6 codes have restrictions on the parameters but they are not >>> correctly enforced for Liberation & Blaum-Roth codes in the CEPH wrapper >>> class ... see text from PDF >>> >>> "Minimal Density RAID-6 codes are MDS codes based on binary matrices which >>> satisfy a lower-bound on the number of non-zero entries. Unlike Cauchy >>> coding, the bit-matrix elements do not correspond to elements in GF (2 w ). >>> Instead, the bit-matrix itself has the proper MDS property. Minimal Density >>> RAID-6 codes perform faster than Reed-Solomon and Cauchy Reed-Solomon codes >>> for the same parameters. Liberation coding, Liber8tion coding, and >>> Blaum-Roth coding are three examples of this kind of coding that are >>> supported in jerasure. >>> >>> With each of these codes, m must be equal to two and k must be less than or >>> equal to w. The value of w has restrictions based on the code: >>> >>> • With Liberation coding, w must be a prime number [Pla08b]. >>> • With Blaum-Roth coding, w + 1 must be a prime number [BR99]. • With >>> Liber8tion coding, w must equal 8 [Pla08a]. >>> >>> ... >>> >>> Do you add this fixes? >> >> Nice catch. I created and assigned to myself : >> http://tracker.ceph.com/issues/6754 >>> >>> For the benchmark suite it runs currently 308 different configurations for >>> the 2 algorithm which make sense from the performance point of view and >>> provides this output: >>> >>> >>> # ----------------------------------------------------------------- >>> # Erasure Coding Benchmark - (C) CERN 2013 - [email protected] >>> # Ram-Size=12614856704 Allocation-Size=100000000 >>> # ----------------------------------------------------------------- >>> # [ -BENCH- ] [ ] technique=memcpy >>> speed=5.408 [GB/s] latency=9.245 ms >>> # [ -BENCH- ] [ ] technique=d=a^b^c-xor >>> speed=4.377 [GB/s] latency=17.136 ms >>> # [ -BENCH- ] [001/304] >>> technique=cauchy_good:k=05:m=2:w=8:lp=0:packet=00064:size=50000000 >>> speed=1.308 [GB/s] latency=038 [ms] size-overhead=40 [%] >>> .. >>> .. >>> # [ -BENCH- ] [304/304] >>> technique=liberation:k=24:m=2:w=29:lp=2:packet=65536:size=50000000 >>> speed=0.083 [GB/s] latency=604 [ms] size-overhead=16 [%] >>> # ----------------------------------------------------------------- >>> # Erasure Code Performance Summary:: >>> # ----------------------------------------------------------------- >>> # RAM: 12.61 GB >>> # Allocation-Size 0.10 GB >>> # ----------------------------------------------------------------- >>> # Byte Initialization: 29.35 MB/s >>> # Memcpy: 5.41 GB/s >>> # Triple-XOR: 4.38 GB/s >>> # ----------------------------------------------------------------- >>> # Fastest RAID6 2.72 GB/s >>> liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000 >>> # Fastest Triple Failure 0.96 GB/s >>> cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000 >>> # Fastest Quadr. Failure 0.66 GB/s >>> cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000 >>> # ----------------------------------------------------------------- >>> # ................................................................. >>> # Top 1 RAID6 2.72 GB/s >>> liber8tion:k=06:m=2:w=8:lp=0:packet=04096:size=50000000 >>> # Top 2 RAID6 2.72 GB/s >>> liber8tion:k=06:m=2:w=8:lp=0:packet=16384:size=50000000 >>> # Top 3 RAID6 2.64 GB/s >>> liber8tion:k=06:m=2:w=8:lp=0:packet=65536:size=50000000 >>> # Top 4 RAID6 2.60 GB/s >>> liberation:k=07:m=2:w=7:lp=0:packet=16384:size=50000000 >>> # Top 5 RAID6 2.59 GB/s >>> liberation:k=05:m=2:w=7:lp=0:packet=04096:size=50000000 >>> # ................................................................. >>> # Top 1 Triple 0.96 GB/s >>> cauchy_good:k=06:m=3:w=8:lp=0:packet=04096:size=50000000 >>> # Top 2 Triple 0.94 GB/s >>> cauchy_good:k=06:m=3:w=8:lp=0:packet=16384:size=50000000 >>> # Top 3 Triple 0.93 GB/s >>> cauchy_good:k=06:m=3:w=8:lp=0:packet=65536:size=50000000 >>> # Top 4 Triple 0.89 GB/s >>> cauchy_good:k=07:m=3:w=8:lp=0:packet=04096:size=50000000 >>> # Top 5 Triple 0.87 GB/s >>> cauchy_good:k=05:m=3:w=8:lp=0:packet=04096:size=50000000 >>> # ................................................................. >>> # Top 1 Quadr. 0.66 GB/s >>> cauchy_good:k=06:m=4:w=8:lp=0:packet=04096:size=50000000 >>> # Top 2 Quadr. 0.65 GB/s >>> cauchy_good:k=07:m=4:w=8:lp=0:packet=04096:size=50000000 >>> # Top 3 Quadr. 0.64 GB/s >>> cauchy_good:k=06:m=4:w=8:lp=0:packet=16384:size=50000000 >>> # Top 4 Quadr. 0.64 GB/s >>> cauchy_good:k=05:m=4:w=8:lp=0:packet=04096:size=50000000 >>> # Top 5 Quadr. 0.64 GB/s >>> cauchy_good:k=06:m=4:w=8:lp=0:packet=65536:size=50000000 >>> # ................................................................. >>> >>> It takes around 30 second on my box. >> >> >> That looks great :-) If I understand correctly, it means >> https://github.com/ceph/ceph/pull/740 will no longer have benchmarks as they >> are moved to a separate program. Correct ? >> >>> I will add a measurement how the XOR and the 3 top algorithms scale with >>> the number of cores and make the object-size configurable from the command >>> line. Anything else ? >> >> It would be convenient to run this from a "workunit" ( i.e. a script in >> ceph/qa/workunits/ ) so that it can later be run by teuthology integration >> tests. That could be used to show regression. >> >> Shall I add the possiblity to test a single user specified configuration via >> command line arguments? >>> >> I would need to play with it to comment usefully. >> >> Cheers >> > -- Loïc Dachary, Artisan Logiciel Libre
signature.asc
Description: OpenPGP digital signature
