Hi all,
answering several replies to my OP:
Bert Huijben wrote:
-----Original Message-----
From: Stefan Fuhrmann [mailto:stefanfuhrm...@alice-dsl.de]
Sent: woensdag 12 mei 2010 12:25
To: d...@apr.apache.org
Subject: First SVN performance data
Hi there,
as I promised, I'm going to conduct some in-depth analysis and
comprehensive SVN performance testing.
That is very time-consuming process.
I think this belongs on d...@subversion instead of d...@apr; moving to that
list.
I'd hit the wrong button :/
"Export" has been chosen to eliminate problems with client-side w/c
performance.
Note that this also eliminates delta transfers, which are very common in
these wc operations. Subversion tries to optimize transfers by only
transferring binary updates, but you eliminated that.
From what I have seen, most time is spent (in that order) in
* wire compression (i.e. before sending data via svn://)
* de-compression while getting data from FSFS
* MD5
Due to the skip-delta storage, I would assume constructing
delta (r1, r2) could be more expensive than delta (r, 0).
But I haven't checked that, yet.
I would assume that transferring full files as deltas is not the most
efficient way to transfer these full files. No surprises there. (And
transferring deltas will hopefully be faster too after this work)
With wire compression disabled, the transfer takes about
25% of the wall clock time.
In the performance measurements I did just before releasing 1.6 (on
checkout, update, merge) I found that most time was spend on client-io in
the working copy and on locking, but all of this was removed from your
testset as you are only testing the subversion fs (filesystem) and IO
performance.
While I'm certain that these components can use more than a bit of
optimization to enhance the scalability of Subversion, I'm not sure if these
are the true performance problems of subversion as noted in our user
feedback.
There is a APR patch that reduces the I/O overhead,
especially under Windows. But there has been no feedback:
https://issues.apache.org/bugzilla/show_bug.cgi?id=49085
However, even without that patch the client-side file I/O
takes only 50% of the available wall clock time (see
user+sys vs. real). So, writing the data twice plus some
DB update (written once!) would still be feasible within
the wall clock time-frame set by the server performance.
If you are looking at just 'svn export' you completely eliminate the
reporting phase from an update editor (the client->server reporting), which
is one of the major components of svn update and svn status --update. You
just leave the server->client data transfer, the in-memory property handling
and writing out the possibly translated files on the client.
Client-side reporting has never been a real problem
for me because TSVN crawls the folder tree from
time to time keeping the cache hot. Provided wc-ng
will not be a major regression, the actual data download
remains the bottleneck.
Michael Pilato and Hyrum Wright interviewed some enterprise users earlier
this year and wrote some reports which indicated that the network latency
and working copy performance were the true bottlenecks. If I look at
^/subversion/trunk/notes/feedback, I see checkout, log, merging as primary
performance issues and this matches the performance issues I see in my day
to day use of repositories on the other side of the world.
Same here. That is why my focus is on c/o performance.
With the w/c currently undergoing major refactoring,
the server-side is the only "tangible" link in the chain.
For TSVN, I already solved the log performance issue
for most practical purposes ;)
Many of the performance improvements are also relevant
for client code: APR (file I/O), APR-Util (MD5), zlib,
svn libs file access overhead.
Theoretically svn
checkout should be pretty similar to the svn export your testing, but
Subversion completely implements this as updating from r0 to the specified
revision. Most of the time on these operations is spend on the client
waiting for disk (libsvn_wc administration, moving and rewriting files) and
on network-IO caused by latency (not by throughput).
O.k. I will have a look at the update editor performance.
My strategy is as follows:
* eliminate "friction" within the server code, use high-volume
requests as a benchmark
-> everything else should automatically be faster than this "worst case"
+ improved SNR for further analysis
* gather data and idea for fs-ng along the way
* find a I/O & network setup that copes the backend performance
-> clear advice to users on how to maximize performance,
if that is crucial to them
* extend optimization efforts to w/c
* optimize I/O patterns to bring other setups closer to the "best case"
(On the Slik Windows buildbot, which runs all tests on a NTFS ramdrive, I
see that the tests are CPU bound these days... But that is after eliminating
all disk IO during the tests. When I run them on a normal workstation disk I
see different results)
Hopefully, SSDs and ample memory will remedy the
I/O performance disparity.
Bolstridge, Andrew <andy.bolstridge_at_intergraph.com
<mailto:andy.bolstridge_at_intergraph.com?Subject=RE:%20First%20SVN%20performance%20data>>
wrote
///> From: Stefan Fuhrmann [mailto:stefanfuhrmann_at_alice-dsl.de] /
///> /
///> * SVN servers tend to be CPU-limited /
/> (we already observed that problem @ our company /
/> with SVN 1.4) /
Lovely figures, but I'm guessing CPU will be more of a bottleneck when
you run on a server with 24Gb RAM and 4 SSDs in RAID-0 configuration.
The server will allocate so much to cache that everything is
super-quick, and what disk IO there is, will also be super quick.
If you look at the bottom end of the attachment, you will see
the actual amount of RAM used during the tests. It says that
the amount of memory your server should have is roughly
the size of your working copy.
Also, a SSD raid is *slower* than an SSD directly hooked
to a simple controller chip. I built my workstation a while
ago and had to compensate for the crappy flash controllers
used at that time -> proper raid controller w/ memory.
Do you have benchmarks on a more 'representative' server? Say, a dual
core with
2Gb RAM and 4 HDD disks in RAID5 running as a VM image, and SVN running
over http?
I could use my old workstation but it doesn't have a RAID5.
At our company, we run SVN 1.4 servers in a VM (1G phys.
RAM) accessing a SAN. Since many developers work on
the same projects, the VM has been proven CPU-bound.
But maybe the following numbers would interest you. I hooked
a cheap USB harddisk to my notebook and compared it to
its internal SSD drive. Tests were executed on WinXP 32
using the TSVN repository:
flat sharded packed
usb 1st run 2:11.641 2:39.468 1:32.343
usb 2nd run 0:03.921 0:03.578 0:03.375
ssd 1st run 0:07.031 0:09.656 0:05.859
ssd 2nd run 0:03.750 0:03.406 0:03.296
normalized by runtime for "packed"
usb 1st run 1.42 1.73 1.00
usb 2nd run 1.16 1.06 1.00
ssd 1st run 1.20 1.65 1.00
ssd 2nd run 1.14 1.03 1.00
normalized by fasted setup
usb 1st run 39.94 48.38 28.02
usb 2nd run 1.19 1.09 1.02
ssd 1st run 2.13 2.93 1.78
ssd 2nd run 1.14 1.03 1.00
From these numbers, you can see:
* use either packed 1.6 format or flat 1.4 format
* your storage location must be cachable on the server
Johan Corveleyn wrote:
> * SVN servers tend to be CPU-limited
If you're using an NFS connected SAN, it's a whole different ballgame.
Ok, that may not be the fastest way to run an SVN server, but I think
it's often set up that way in a lot of companies, for various
practical reasons. I'd like SVN to be fast in this setup as well (as
are many other server applications).
At some point in future, you will have to decide upon new
server hardware. It is important to understand that SVN
is a database server and as such has specific needs. You
probably spend large amounts of money on your backup
solution, including safe / vault, service contracts etc.
If that is the case, the following rationals could guide your
decisions:
* a backup solution that handles "terror bytes" of data
within the defined backup / restore window is expensive.
If keeping your data safe is worth a certain amount of
money, you should be allowed to spend at least the same
amount of money for making the data usable.
* only a few repositories are actually "hot". Document
and binary repositories usually are not and / or work
reasonably well from with I/O (due to item size).
* RAM is ~32G/1kEUR to 256G/10kEUR. The latter
is enough for >10 KDE-sized projects, i.e. several
1000 developers.
* Flash is ~512G/1kEUR. Most companies won't need
more than 1T for their hot repositories. Use it as a
write-through cache and copy new revisions to the SAN.
IOW, spend 5 grand on a pizza box and make hundreds
of users happy. Those users also happen to be very expensive
and critical to your company's success.
> * packed repositories are ~20% faster than non-packed,
> non-sharded
Also, I don't see those ~20% in your test numbers (more like ~5%), but
maybe you have other numbers that show this difference?
It's 2.5 ./. 2.0 secs for "export hot data from 1.7".
In the numbers above it is somewhere between
15 and 40%. The key factor is probably the file
system being used (credentials checking etc).
> * optimal file cache size is roughly /trunk size
> (plus branch diffs, but that is yet to be quantified)
> * "cold" I/O from a low-latency source takes 2 .. 3 times
> as long as from cached data
Ok, but unless you can get almost the entire repository in cache,
that's not very useful IMHO. In my tests, I mainly focused on the
"first run", because I want my server to be fast with a cold cache.
Because that's most likely the performance that my users will get.
It's a busy repository, with different users hitting different parts
of the repository all the time. I just don't think there will be a lot
of cache hits during a normal working day.
You don't need the entire repository to be in cache
- just as my numbers show. Also, in teams with more
than one person working on the same development
line, data written once but read many times.
Think about continuous integration, for instance. I
began investigation that issue when load peaks caused
checkout times to grow from 8 minutes to over an hour.
Adding a second CPU almost eliminated the effect.
Also, if the test with cached data is 2-3 times faster than from the
SSD RAID-0, that's another indication to me that there's a lot of time
spent in I/O. And where there's a lot of time spent, there is
potentially a lot of time to be saved by optimizing.
I certainly will take a closer look at I/O. However, it
seems to me that at least the amount of data being
transferred is close to optimal (judging from the amount
of data being processed by zlib).
From the numbers above, you can see that a not-top-notch
SSD directly hooked to the controller is much faster than
the supposedly fast RAID0 configuration. It is all about
latency - not so much OS overhead like "open file".
> * a fully patched 1.7 server is twice as fast as 1.6.9
>
> "Export" has been chosen to eliminate problems
> with client-side w/c performance.
I mainly focused on log and blame (and checkout/update to a lesser
degree), so that may be one of the reasons why we're seeing it
differently :-) . I suppose the numbers, bottlenecks, ... totally
depend on the use case (as well as the hardware/network setup).
The log performance issue has been solved more or less
in TSVN. In 1.7, we also brought the UI up to speed
with the internals: even complex full-text searches over
millions of changes are (almost) interactive.
> If I look at
> ^/subversion/trunk/notes/feedback, I see checkout, log, merging as primary
> performance issues and this matches the performance issues I see in my day
> to day use of repositories on the other side of the world.
Ok, so you agree log is one of the important performance issues. That
one is very much I/O bound on the server (as I described before,
opening and closing rev files multiple times).
To speed up log on the server side, you need to maintain
an index. That's certainly not going to happen before fs-ng.
Otherwise, you will always end up reading every revision file.
Only exception: log on the repo root with no changed path
listing.
-- Stefan^2.