Hi,

I'm experiencing a file descriptor leak in the Subversion 1.6.x branch. I'm hit 
by it in the 1.6.1 build included with Mac OS X 10.6, a 1.6.11 build from 
MacPorts and a build of the 1.6.x branch. The bug is not present in 1.5.7, nor 
in trunk.

The bug is somewhat subtle, and the circumstances causing it are fairly 
complex. In the cause of running a test suite, we open repositories repeatedly 
using the ‘file’ protocol, log their history and fetch the contents of all 
revisions. This will fail after about a hundred tests, having exhausted file 
descriptors. Inspecting the output of ‘lsof’ on the process, there are 216 open 
references to ‘rep-cache.db’ files.

As mentioned, the circumstances causing the bug are fairly complex. I see this 
in hgsubversion,[1] a Mercurial plugin for Subversion interoperability. Both 
Mercurial and hgsubversion are written in Python, and historically, 
hgsubversion has used the SWIG bindings for Subversion. Unfortunately, we have 
found the SWIG bindings to leak like a sieve; it is not uncommon that 
converting large repositories uses several gigabytes of memory or even exhaust 
address space in a 32-bit environment. As an effort to fix this, I've been 
writing a Subvertpy backend for hgsubversion. Subvertpy[2] is a set of 
alternate Python bindings for Subversion, that expose a much less complicated 
API, and — most importantly — deals with memory allocation internally, rather 
than exposing it to the Python environment.

So far, the results are good; converting a test repository (cvs2svn) using the 
HTTP or svn protocols is slower, but with significantly lower memory use. (Some 
of that overhead just might be the cost of deallocating more; you never know…) 
The file protocol, however, appears to leak somewhat, using 27% less memory 
than when using the SWIG bindings, but use twice as much CPU time. (Please note 
that I haven't tested this with anything other than 1.6.) The file protocol is 
the main protocol used by the hgsubversion test suite. Whereas leaking one file 
descriptor per repository is insignificant during common use, our test suite 
opens hundreds of repositories in a single process.

Considering the many packages involved, it's not easy to determine which one 
might be buggy. A few observations:

* While hgsubversion leaks using the SWIG bindings, it doesn't leak file 
descriptors. This suggests that it isn't the cause of *this* leak.
* It is quite possible that the source of the leak is in Subvertpy. In order to 
get hgsubversion working using it, I had to add a few missing wrapper APIs. 
However…
* Neither hgsubversion nor Subvertpy contain any logic related to the various 
repository access methods. It seems likely that if either were the cause of the 
bug, it should affect all access methods and not just one.
* From a brief inspection of the Subversion source code, it appears that the 
‘rep-cache.db’ is an implementation detail deep in Subversion. It's odd that 
this file remains open throughout the lifetime of the process. If the source of 
the leak were higher up in the chain, wouldn't other files in the repository 
remain open as well?
* Finally, there's the point that the leak isn't present when using Subversion 
1.5.7 or 1.7.x. Subvertpy uses slightly different code paths for 1.5.7, but for 
1.7.x, the code used is exactly the same. hgsubversion requires Subversion 1.5, 
and uses the same paths regardless of the underlying version of Subversion.

I haven't been able to reproduce this outside our test suite. Opening a 
repository directly doesn't cause ‘rep-cache.db’ to opened, nor does obtaining 
a log of all revisions. hgsubversion has two modes for fetching revisions; 
replay and diff-based. Hacking the tests to use one instead only affects how 
many repositories are processed before exhausting descriptors. For reference, 
I've attached the output of ‘lsof’ on a process running our test suite; both 
unfiltered and filtered for readability.

So, what to do now? I've discussed this with the Subvertpy author, Jelmer 
Vernooij, and he's at a loss as to what might cause this other than a bug in 
Subversion. I'd like to be able to diagnose this further, but so far, I haven't 
been able to get Subversion to open ‘rep-cache.db’ file. So I ask you guys: Do 
you think this is a bug in Subversion, or somewhere else? Do you have any hints 
on what I can do to diagnose this further? If it is a bug in Subversion, could 
it be fixed in the 1.6.x branch?

(I've Cc'ed this mail to Augie Fackler and Jelmer Vernooij, the maintainers of 
hgsubversion and Subvertpy, respectively.)

[1] http://code.google.com/p/hgsubversion/
[2] http://samba.org/~jelmer/subvertpy/

--

Dan Villiom Podlaski Christiansen
dan...@gmail.com

Attachment: hgsubversion-subvertpy-rebuildmeta-leak-filtered.txt.bz2
Description: BZip2 compressed data

Attachment: hgsubversion-subvertpy-rebuildmeta-leak-full.txt.bz2
Description: BZip2 compressed data

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to