On Thu, 2011-04-07 at 08:03 -0500, Hyrum K Wright wrote: > This is great work, though as you admit the results are bit > incomprehensible at this stage. Would it make sense to somehow
Hmm, I think stsp was a bit early to post, I'm still in the sanity
checking phase ;)
E.g.,:
[[[
Timings for 5x5_1.6
N min max avg operation (unit is seconds)
6 330.544 561.406 425.169 TOTAL RUN
]]]
--> variation of 330s to 561s in 6 identical runs?
(later note: I just found the random seeding might be in the wrong
place. I recently refactored a few things and it should prbly go inside
the N loop now.)
I have no idea about other loads on our vm. Maybe we could use the
user/sys output of 'time -p' instead of the real-world time measurements
it is taking right now, although my intention was to stick to the actual
real world time. Of course that works best on an otherwise idle system.
danielsh, or anyone, do you have any idea at what times-of-day (or
something) the VM host is most likely to idle for ~2 hours, so we can
schedule the tests then?
> combine the scripts to prevent the bifurcation of benchmark suites?
For the curious, I attached the current state of my benchmark scripts,
but I'm still editing.
> Do you have the script of commands, and the accompanying repo?
ATM my scripts are in an elego-private repo, they were initially going
to be just a quick test to have real timing results in an article about
1.7.
Actually, I think they have matured enough to be committed in tools/dev.
If Mark does the same we could have a collective look at them and
decide. I haven't actually seen Mark's results nor scripts yet. I have
posted mine before, older versions, a few months ago...
So where to put it?
tools/dev/benchmark/A <--- mark's benches
tools/dev/benchmark/B <--- my benches
???
I'm reluctant to label the dir with my name...
It'd be nice to find two descriptive words to distinguish between the
benchmarks, "by type".
In reply to CMike, an outline of what my test does:
I have a little python magic I use to call svn subcommands, calling the
cmdline svn client from py. Each call is being timed and added to a set
of N calls of the same subcommand.
So it has no understanding of whether it got 'svn delete one/file.c' or
'svn delete huge/tree/*', it simply adds another sample. So that's
almost madness.
But it goes on and runs the same identical sequence a number of times,
currently 6. Their values are averaged, and the min/max are retained to
have some sense of the real-world meaning of the average.
These 6 identical runs are done once for 1.6.x and once for trunk. They
are identical because I use a pseudo random number generator (to get
"organic" modifications) that is seeded with 0 for each run :P
Finally, the averages/mins/maxes for 1.6.x are compared to those of
trunk, showing a factor and a difference in seconds. (A factor of 10
were negligible if the actual time added was only 2 microseconds...)
Because of this "madness" I mentioned above (variation in single calls),
the *averages* are what's really interesting to look at. Even though an
average for 1.6 may be made up of (60 + 0.06)/2, trunk's average could
be (90 + 0.01)/2, and still shows the worst case pretty visibly (a long
call taking even longer). You can stare at the min/max differences to
make up your mind as to whether the averages are actual timings in
seconds or unitless fantasy ratings. Yet I am pretty confident that they
sufficiently compare the two clients.
In stsp's post, every single run was printed out. The really interesting
places to look though are where it says "Results for"... and
"COMPARE"...
[[[
COMPARE 5x5_1.6 to 5x5_trunk
1.23|+0.45 means factor=1.23, difference in seconds = 0.45
factor < 1 or difference < 0 means '5x5_trunk' is faster than
'5x5_1.6'
min max avg operation
1.151|+49.7471 0.863|-76.9170 1.036|+15.1526 TOTAL RUN
1.322|+0.0026 0.631|-1.4086 0.938|-0.0019 add
1.864|+0.0113 0.413|-11.9194 0.513|-3.5088 checkout
0.625|-2.3169 0.533|-24.8236 0.769|-4.0490 commit
0.899|-0.0030 0.129|-0.7784 0.122|-0.3825 copy
44.382|+100.3599 16.604|+122.9722 24.856|+109.4408 delete
1.245|+1.6928 0.943|-1.2191 0.768|-3.3842 merge
1.258|+0.0023 0.263|-1.8166 1.189|+0.0028 mkdir
1.256|+0.0022 0.187|-2.3871 0.910|-0.0016 prop mod
1.237|+0.0021 0.153|-0.4198 0.480|-0.0174 propdel
1.267|+0.0020 0.267|-0.2795 1.544|+0.0050 proplist
1.267|+0.0023 0.091|-5.4157 1.018|+0.0003 propset
1.148|+0.0013 2.347|+0.0133 1.254|+0.0024 resolve
2.764|+0.5885 0.878|-0.3298 1.565|+0.5413 resolved
1.410|+0.0733 3.061|+1.9744 2.038|+0.4877 status
0.602|-3.3563 0.422|-8.9009 0.470|-6.4426 switch
1.622|+0.0934 0.302|-21.6808 0.421|-3.9173 update
]]]
An interpretation could be: 'svn delete' currently sucks for some reason
(24 times slower???), as well as 'svn status' (twice as slow, but then
again it only added half a second to a run that was >400 seconds long,
on average). The rest looks quite encouraging. IIRC, when I ran this a
few months back I had higher slowness factor numbers than this (and,
notably, a much saner 'svn delete' factor, so there must be some newly
introduced speed bug in trunk's 'svn delete' -- or in my script).
Interesting to note is that the *variation* in time it takes for a run
became somewhat smaller using trunk. So trunk's timings seem to be
slightly more homogeneous than 1.6.x's (looking at the TOTAL min/max).
Now you may note that there are several types of these, i.e. 5x5, 1x100
and 100x1. 5x5 means there is a WC depth of 5 directories, each one
holding 5 subdirs and 5 files. 1x100 is a flat directory with 100 dir
and file entries. 100x1 is a cascade of single subdirs that is 100 dir
levels deep, also with one file on each subdir level.
The 100x1 case is actually where the 1.6 client sucks very badly -- it's
visible in timing but even worse memory wise. Watching `top` while it
runs reveals a memory footprint going towards gigabyte scale.
Consequently, this is where trunk really really ROCKS because it does
the same job with a few mb of RAM, in a fraction of the time.
I'm playing with the thought of making graphical charts of these
numbers, but that might be overkill :)
Any ideas/opinions about adding to tools/dev/ would be appreciated.
Finally, the interesting part that shows what the script tests:
[[[
run_cmd(['svnadmin', 'create', repos])
svn('checkout', file_url, wc)
trunk = j(wc, 'trunk')
create_tree(trunk, levels, spread)
add(trunk)
st(wc)
ci(wc)
up(wc)
propadd_tree(trunk, 0.5)
ci(wc)
up(wc)
st(wc)
trunk_url = file_url + '/trunk'
branch_url = file_url + '/branch'
svn('copy', '-mm', trunk_url, branch_url)
st(wc)
up(wc)
st(wc)
svn('checkout', trunk_url, wc2)
st(wc2)
modify_tree(wc2, 0.5)
st(wc2)
ci(wc2)
up(wc2)
up(wc)
svn('switch', branch_url, wc2)
modify_tree(wc2, 0.5)
st(wc2)
ci(wc2)
up(wc2)
up(wc)
modify_tree(trunk, 0.5)
st(wc)
ci(wc)
up(wc2)
up(wc)
svn('merge', '--accept=postpone', trunk_url, wc2)
st(wc2)
svn('resolve', '--accept=mine-conflict', wc2)
st(wc2)
svn('resolved', '-R', wc2)
st(wc2)
ci(wc2)
up(wc2)
up(wc)
svn('merge', '--accept=postpone', '--reintegrate', branch_url, trunk)
st(wc)
svn('resolve', '--accept=mine-conflict', wc)
st(wc)
svn('resolved', '-R', wc)
st(wc)
ci(wc)
up(wc2)
up(wc)
svn('delete', j(wc, 'branch'))
ci(wc)
up(wc2)
up(wc)
]]]
Note that I still have a few plans for tweaks I'm not discussing because
this mail is long enough.
~Neels
#!/usr/bin/env python
"""
usage: benchmark.py run <run_file> <levels> <spread> [N]
benchmark.py show <run_file>
benchmark.py compare <run_file1> <run_file2>
Test data is written to run_file.
If a run_file exists, data is added to it.
<levels> is the number of directory levels to create
<spread> is the number of child trees spreading off each dir level
If <N> is provided, the run is repeated N times.
"""
import os, sys, time
import tempfile
from datetime import datetime, timedelta
from subprocess import Popen, PIPE, call
import random
import shutil
import cPickle
VERBOSE = False
DEFAULT_TIMINGS_PATH = './benchmark_py_last_run.py-pickle'
timings = None
def run_cmd(cmd, stdin=None, shell=False):
if shell:
printable_cmd = 'CMD: ' + cmd
else:
printable_cmd = 'CMD: ' + ' '.join(cmd)
if VERBOSE:
print printable_cmd
if stdin:
stdin_arg = PIPE
else:
stdin_arg = None
p = Popen(cmd, stdin=stdin_arg, stdout=PIPE, stderr=PIPE, shell=shell)
stdout,stderr = p.communicate(input=stdin)
if VERBOSE:
if (stdout):
print "STDOUT: [[[\n%s]]]" % ''.join(stdout)
if (stderr):
print "STDERR: [[[\n%s]]]" % ''.join(stderr)
return stdout,stderr
def timedelta_to_seconds(td):
return ( float(td.seconds)
+ float(td.microseconds) / (10**6)
+ td.days * 24 * 60 * 60 )
class Timings:
def __init__(self, *ignore_svn_cmds):
self.timings = {}
self.current_name = None
self.tic_at = None
self.ignore = ignore_svn_cmds
self.name = None
def tic(self, name):
if name in self.ignore:
return
self.toc()
self.current_name = name
self.tic_at = datetime.now()
def toc(self):
if self.current_name and self.tic_at:
toc_at = datetime.now()
self.submit_timing(self.current_name,
timedelta_to_seconds(toc_at - self.tic_at))
self.current_name = None
self.tic_at = None
def submit_timing(self, name, seconds):
times = self.timings.get(name)
if not times:
times = []
self.timings[name] = times
times.append(seconds)
def summary(self):
s = []
if self.name:
s.append('Timings for %s' % self.name)
s.append(' N min max avg operation (unit is seconds)')
names = sorted(self.timings.keys())
for name in names:
timings = self.timings.get(name)
if not name or not timings: continue
s.append('%5d %7.3f %7.3f %7.3f %s' % (
len(timings),
min(timings),
max(timings),
reduce(lambda x,y: x + y, timings) / len(timings),
name))
return '\n'.join(s)
def compare_to(self, other):
selfname = self.name
if not selfname:
selfname = 'unnamed'
othername = other.name
if not othername:
othername = 'the other'
s = ['COMPARE %s to %s'%(othername, selfname),
' 1.23|+0.45 means factor=1.23, difference in seconds = 0.45',
' factor < 1 or difference < 0 means \'%s\' is faster than \'%s\''
% (self.name, othername)]
s.append(' min max avg operation')
def do_div(a, b):
if b:
return float(a) / float(b)
else:
return 0.0
def do_diff(a, b):
return float(a) - float(b)
def min_max_avg(ttimings):
return ( min(ttimings),
max(ttimings),
reduce(lambda x,y: x + y, ttimings) / len(ttimings) )
names = sorted(self.timings.keys())
for name in names:
timings = self.timings.get(name)
other_timings = other.timings.get(name)
if not other_timings:
continue
min_me, max_me, avg_me = min_max_avg(timings)
min_other, max_other, avg_other = min_max_avg(other_timings)
s.append('%-16s %-16s %-16s %s' % (
'%7.3f|%+7.4f' % (
do_div(min_me, min_other),
do_diff(min_me, min_other)
),
'%7.3f|%+7.4f' % (
do_div(max_me, max_other),
do_diff(max_me, max_other)
),
'%7.3f|%+7.4f' % (
do_div(avg_me, avg_other),
do_diff(avg_me, avg_other)
),
name))
return '\n'.join(s)
def add(self, other):
for name, other_times in other.timings.items():
my_times = self.timings.get(name)
if not my_times:
my_times = []
self.timings[name] = my_times
my_times.extend(other_times)
j = os.path.join
_create_count = 0
def next_name(prefix):
global _create_count
_create_count += 1
return '_'.join((prefix, str(_create_count)))
def create_tree(in_dir, levels, spread=5):
try:
os.mkdir(in_dir)
except:
pass
for i in range(spread):
# files
fn = j(in_dir, next_name('file'))
f = open(fn, 'w')
f.write('This is %s\n' % fn)
f.close()
# dirs
if (levels > 1):
dn = j(in_dir, next_name('dir'))
create_tree(dn, levels - 1, spread)
def svn(*args):
global timings
name = args[0]
cmd = ['svn']
cmd.extend(args)
if VERBOSE:
print 'svn cmd: ' + ' '.join(cmd)
stdin = None
if stdin:
stdin_arg = PIPE
else:
stdin_arg = None
timings.tic(name)
try:
p = Popen(cmd, stdin=stdin_arg, stdout=PIPE, stderr=PIPE, shell=False)
stdout,stderr = p.communicate(input=stdin)
finally:
timings.toc()
if VERBOSE:
if (stdout):
print "STDOUT: [[[\n%s]]]" % ''.join(stdout)
if (stderr):
print "STDERR: [[[\n%s]]]" % ''.join(stderr)
return stdout,stderr
def add(*args):
return svn('add', *args)
def ci(*args):
return svn('commit', '-mm', *args)
def up(*args):
return svn('update', *args)
def st(*args):
return svn('status', *args)
_chars = [chr(x) for x in range(ord('a'), ord('z') +1)]
def randstr(len=8):
return ''.join( [random.choice(_chars) for i in range(len)] )
def _copy(path):
dest = next_name(path + '_copied')
svn('copy', path, dest)
def _move(path):
dest = path + '_moved'
svn('move', path, dest)
def _propmod(path):
so, se = svn('proplist', path)
propnames = [line.strip() for line in so.strip().split('\n')[1:]]
# modify?
if len(propnames):
svn('ps', propnames[len(propnames) / 2], randstr(), path)
# del?
if len(propnames) > 1:
svn('propdel', propnames[len(propnames) / 2], path)
def _propadd(path):
# set a new one.
svn('propset', randstr(), randstr(), path)
def _mod(path):
if os.path.isdir(path):
return _propmod(path)
f = open(path, 'a')
f.write('\n%s\n' % randstr())
f.close()
def _add(path):
if os.path.isfile(path):
return _mod(path)
if random.choice((True, False)):
# create a dir
svn('mkdir', j(path, next_name('new_dir')))
else:
# create a file
new_path = j(path, next_name('new_file'))
f = open(new_path, 'w')
f.write(randstr())
f.close()
svn('add', new_path)
def _del(path):
svn('delete', path)
_mod_funcs = (_mod, _add, _propmod, _propadd, )#_copy,) # _move, _del)
def modify_tree(in_dir, fraction):
child_names = os.listdir(in_dir)
for child_name in child_names:
if child_name[0] == '.':
continue
if random.random() < fraction:
path = j(in_dir, child_name)
random.choice(_mod_funcs)(path)
for child_name in child_names:
if child_name[0] == '.': continue
path = j(in_dir, child_name)
if os.path.isdir(path):
modify_tree(path, fraction)
def propadd_tree(in_dir, fraction):
for child_name in os.listdir(in_dir):
if child_name[0] == '.': continue
path = j(in_dir, child_name)
if random.random() < fraction:
_propadd(path)
if os.path.isdir(path):
propadd_tree(path, fraction)
def run(levels, spread, N):
global timings
# ensure identical modifications for every run of this script
random.seed(0)
for i in range(N):
base = tempfile.mkdtemp()
try:
repos = j(base, 'repos')
wc = j(base, 'wc')
wc2 = j(base, 'wc2')
file_url = 'file://%s' % repos
so, se = run_cmd(['which', 'svn'])
if not so:
print "Can't find svn."
exit(1)
print '\nRunning svn benchmark in', base
print 'dir levels: %s; new files and dirs per leaf: %s; run %d of %d' %(
levels, spread, i + 1, N)
so, se = svn('--version')
print ', '.join( so.split('\n')[:2] )
started = datetime.now()
try:
run_cmd(['svnadmin', 'create', repos])
svn('checkout', file_url, wc)
trunk = j(wc, 'trunk')
create_tree(trunk, levels, spread)
add(trunk)
st(wc)
ci(wc)
up(wc)
propadd_tree(trunk, 0.5)
ci(wc)
up(wc)
st(wc)
trunk_url = file_url + '/trunk'
branch_url = file_url + '/branch'
svn('copy', '-mm', trunk_url, branch_url)
st(wc)
up(wc)
st(wc)
svn('checkout', trunk_url, wc2)
st(wc2)
modify_tree(wc2, 0.5)
st(wc2)
ci(wc2)
up(wc2)
up(wc)
svn('switch', branch_url, wc2)
modify_tree(wc2, 0.5)
st(wc2)
ci(wc2)
up(wc2)
up(wc)
modify_tree(trunk, 0.5)
st(wc)
ci(wc)
up(wc2)
up(wc)
svn('merge', '--accept=postpone', trunk_url, wc2)
st(wc2)
svn('resolve', '--accept=mine-conflict', wc2)
st(wc2)
svn('resolved', '-R', wc2)
st(wc2)
ci(wc2)
up(wc2)
up(wc)
svn('merge', '--accept=postpone', '--reintegrate', branch_url, trunk)
st(wc)
svn('resolve', '--accept=mine-conflict', wc)
st(wc)
svn('resolved', '-R', wc)
st(wc)
ci(wc)
up(wc2)
up(wc)
svn('delete', j(wc, 'branch'))
ci(wc)
up(wc2)
up(wc)
finally:
stopped = datetime.now()
print '\nDone with svn benchmark in', (stopped - started)
timings.submit_timing('TOTAL RUN', timedelta_to_seconds(stopped - started))
# rename ps to prop mod
if timings.timings.get('ps'):
has = timings.timings.get('prop mod')
if not has:
has = []
timings.timings['prop mod'] = has
has.extend( timings.timings['ps'] )
del timings.timings['ps']
print timings.summary()
finally:
shutil.rmtree(base)
def read_from_file(file_path):
f = open(file_path, 'rb')
try:
instance = cPickle.load(f)
instance.name = os.path.basename(file_path)
finally:
f.close()
return instance
def write_to_file(file_path, instance):
f = open(file_path, 'wb')
cPickle.dump(instance, f)
f.close()
def usage():
print __doc__
if __name__ == '__main__':
if len(sys.argv) > 1 and 'compare'.startswith(sys.argv[1]):
if len(sys.argv) < 4:
usage()
exit(1)
p1,p2 = sys.argv[2:4]
t1 = read_from_file(p1)
t2 = read_from_file(p2)
print t1.summary()
print '---'
print t2.summary()
print '---'
print t2.compare_to(t1)
elif len(sys.argv) > 1 and 'combine'.startswith(sys.argv[1]):
if len(sys.argv) < 5:
usage()
exit(1)
dest = sys.argv[-1]
paths = sys.argv[2:-1]
total = Timings('--version');
for path in paths:
t = read_from_file(path)
total.add(t)
print total.summary()
write_to_file(dest, total)
elif len(sys.argv) > 1 and 'run'.startswith(sys.argv[1]):
try:
timings_path = sys.argv[2]
levels = int(sys.argv[3])
spread = int(sys.argv[4])
if len(sys.argv) > 5:
N = int(sys.argv[5])
else:
N = 1
except:
usage()
raise
print '\n\nHi, going to run a Subversion benchmark series of %d runs...' % N
if os.path.isfile(timings_path):
print 'Going to add results to existing file', timings_path
timings = read_from_file(timings_path)
else:
print 'Going to write results to new file', timings_path
timings = Timings('--version')
run(levels, spread, N)
write_to_file(timings_path, timings)
elif len(sys.argv) > 1 and 'show'.startswith(sys.argv[1]):
if len(sys.argv) < 2:
usage()
exit(1)
for timings_path in sys.argv[2:]:
timings = read_from_file(timings_path)
print '---\n%s' % timings_path
print timings.summary()
else: usage()
run
Description: application/shellscript

