On Wed, May 12, 2010 at 8:07 PM, Stefan Fuhrmann <stefanfuhrm...@alice-dsl.de> wrote: > Bottom line: > * SVN servers tend to be CPU-limited > (we already observed that problem @ our company > with SVN 1.4)
Like Andrew Bolstridge said, if you have superfast I/O like you do in your test setup (4 SSD's in RAID-0), I suppose it's normal that I/O isn't the bottleneck anymore. But if SVN requires such storage solutions to be fast, and that's the only way you can bring an SVN server to stress the CPU, that's a strong indication to me that it's very I/O sensitive. If you're using an NFS connected SAN, it's a whole different ballgame. Ok, that may not be the fastest way to run an SVN server, but I think it's often set up that way in a lot of companies, for various practical reasons. I'd like SVN to be fast in this setup as well (as are many other server applications). > * packed repositories are ~20% faster than non-packed, > non-sharded I think a better comparison would be between packed and non-packed (but still sharded). And preferably same server version. Just to focus on the difference between packed and non-packed (that's what I did in my tests). Also, I don't see those ~20% in your test numbers (more like ~5%), but maybe you have other numbers that show this difference? I must say that, when I tested on a server with SSD disk, I also saw something like 5% improvement. OTOH, on my server with NFS/SAN, I saw a ~5% performance decrease (maybe that's because of some extra file opens, which are costly in this setup). So again, it depends a lot on the storage part of the setup. All in all, packing is not really a big win for performance (but it may be better for backups and such). > * optimal file cache size is roughly /trunk size > (plus branch diffs, but that is yet to be quantified) > * "cold" I/O from a low-latency source takes 2 .. 3 times > as long as from cached data Ok, but unless you can get almost the entire repository in cache, that's not very useful IMHO. In my tests, I mainly focused on the "first run", because I want my server to be fast with a cold cache. Because that's most likely the performance that my users will get. It's a busy repository, with different users hitting different parts of the repository all the time. I just don't think there will be a lot of cache hits during a normal working day. Also, if the test with cached data is 2-3 times faster than from the SSD RAID-0, that's another indication to me that there's a lot of time spent in I/O. And where there's a lot of time spent, there is potentially a lot of time to be saved by optimizing. > * a fully patched 1.7 server is twice as fast as 1.6.9 > > "Export" has been chosen to eliminate problems > with client-side w/c performance. I mainly focused on log and blame (and checkout/update to a lesser degree), so that may be one of the reasons why we're seeing it differently :-). I suppose the numbers, bottlenecks, ... totally depend on the use case (as well as the hardware/network setup). That said, I'm very happy that you are working on optimizing the code, and I certainly encourage you to keep going. All performance improvements are extremely welcome, I think. I'll try to look up my old test numbers again, and post them here. It might be interesting to compare notes :-). Some more answers to Bert's post below ... On Thu, May 13, 2010 at 4:31 PM, Bert Huijben <b...@vmoo.com> wrote: > Michael Pilato and Hyrum Wright interviewed some enterprise users earlier > this year and wrote some reports which indicated that the network latency > and working copy performance were the true bottlenecks. Let's assume WC performance will soon be a solved problem thanks to the great development efforts going on now :-). So we're totally ignoring that. Whether or not network latency is important depends heavily on the situation, but I can understand it's a big bottleneck for the larger companies out there (it's not a problem in our case, all devs on a gigabit LAN). > If I look at > ^/subversion/trunk/notes/feedback, I see checkout, log, merging as primary > performance issues and this matches the performance issues I see in my day > to day use of repositories on the other side of the world. Ok, so you agree log is one of the important performance issues. That one is very much I/O bound on the server (as I described before, opening and closing rev files multiple times). Cheers, -- Johan