One of WANdisco's customers brought the performance of "svn lock *" to my attention. There is a network issue over HTTP and a filesystem issue in the repository.
When locking multiple files over HTTP the client sends a separate LOCK request for each file and the round trip delays add up. Also the bandwidth overhead, all the HTTP headers, is high: using one request per path it is not an efficient way to transport paths. On the repository side creating or removing a lock involves writing an index file for each parent directory in addition to handling the lock file itself. To lock N separate files in a directory '/A/B/C' involves writing N times the index files for '/', '/A', '/A/B' and '/A/B/C' as well as handling the N lock files. To lock N files at depth D we do O(N*D) writes but only modify O(N+D) distinct files; it doesn't scale very well. We already pass multiple paths into svn_ra_lock so we could address part of the network problem by rewriting some serf code to make it pipeline the LOCK requests. That would have the advantage of working with older servers but to solve all the problems we need to make HTTP more like the svn protocol: send a single request (perhaps POST instead of LOCK?) for the repository root and pass all the paths in the body of the request. Once we have all the paths arriving at the server in one request we can add new FS APIs to lock/unlock multiple paths, then sort the paths and write each index file only once. I'm not quite sure how hooks would behave. We would need to run all the pre-lock hooks first and some could fail. We could drop the paths that fail and pass the rest to the FS layer, or perhaps fail the whole operation if any pre-lock fails. Either way the FS layer may fail to lock some of the paths for various reasons (non-existant, already locked, etc.) and so the final set of locks could be smaller than the set of paths. Finally we run post-lock for the subset of paths that are locked. There is also an FS atomic issue to consider. The current single path API can be interrupted between writing index files and handling the lock file, but the result is the single path is locked or unlocked with any "broken" index files being invisible to the user (athough the "broken" index files may cause more work for the server). A multiple path API could result in user visible changes to a subset of the paths. I think that would be OK but it needs a bit more thought. I've also noticed that we don't fsync any files when writing locks into the repository. I'm not sure if this is deliberate or not but if we were to start calling fsync then the filesystem issue would become more important. -- Philip Martin | Subversion Committer WANdisco // *Non-Stop Data*