Well, after a week of blood, sweat and tears, I finally tracked down the
    memory leak that's been slowly chipping away at my sanity for the past
    year.

    It's a pretty severe leak that affects the Python SWIG bindings: once
    allocated, an svn_merge_range_t object can never be freed.  This is
    due to a missing Py_DECREF call in swigutil_py.c's convert_rangelist
    method.

    The range lists can be huge, too, which makes this quite a nasty leak.
    I was seeing up to 750KB of RSS being eaten up on each invocation of
    svn_mergeinfo_parse() against non-trivial, real-world svn:mergeinfo
    data (such as that on /subversion/trunk, for example).

    Any of the mergeinfo methods that return a range list (which almost all
    do) are affected.  I haven't looked at the Perl or Ruby bindings in any
    capacity.

    Patch against trunk attached.

    Regards,

        Trent.

[[

Fix a memory leak in convert_rangelist by *always* Py_DECREF'ing the object
returned by convert_to_swigtype (a SWIG wrapper around svn_merge_range_t)
once we've established it's not NULL.

This is required because PyList_Append takes ownership of references passed
in (i.e. calls Py_INCREF).  Prior to this fix, the reference counts of any
svn_merge_range_t objects would always be off-by-one, preventing them from
ever being garbage collected, having dire effects on the memory usage of
long-running programs calling svn_mergeinfo_parse() on real-world data.

Patch by: Trent Nelson <tr...@snakebite.org>
Tested on: FreeBSD, OS X, Windows.

* subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c:
  (convert_rangelist): Make sure we call Py_DECREF on the object returned
   from convert_to_swigtype.  PyList_New might return NULL; check for that.
   Add 'apr_array_header_t *' and 'swig_type_info *' casts where necessary
   to suppress compiler warnings.

* subversion/bindings/swig/python/tests/mergeinfo.py:
  (get_svn_merge_range_t_objects): New helper method that returns a list of
   any 'svn_merge_range_t' objects being tracked by the garbage collector,
   which we use to detect memory leaks.
  (SubversionMergeinfoTestCase): Add two new tests to aid in detecting memory
   leaks.  They essentially test the same thing in two different ways; if one
   fails, the other *should* fail.  (Of course, if one fails and the other does
   not -- that's just as valuable, diagnostically, and points to an issue with
   the 'automatic pool memory management' logic.)
    (test_mergeinfo_leakage__incorrect_range_t_refcounts): New test.  Verify
     svn_merge_range_t objects returned from svn_mergeinfo_parse() have the
     correct refcounts set.
    (test_mergeinfo_leakage__lingering_range_t_objects_after_del): New test.
     Verify that there are no lingering svn_merge_range_t objects after we
     explicitly delete the results returned from svn_mergeinfo_parse().

]]

[[
Index: subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
===================================================================
--- subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c        
(revision 1234786)
+++ subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c        
(working copy)
@@ -653,15 +653,27 @@
 static PyObject *convert_rangelist(void *value, void *ctx, PyObject *py_pool)
 {
   int i;
+  int result;
+  PyObject *obj;
   PyObject *list;
-  apr_array_header_t *array = value;
+  apr_array_header_t *array = (apr_array_header_t *)value;
 
   list = PyList_New(0);
+  if (list == NULL)
+    return NULL;
+
   for (i = 0; i < array->nelts; i++)
     {
       svn_merge_range_t *range = APR_ARRAY_IDX(array, i, svn_merge_range_t *);
-      if (PyList_Append(list, convert_to_swigtype(range, ctx, py_pool)) == -1)
+
+      obj = convert_to_swigtype(range, (swig_type_info *)ctx, py_pool);
+      if (obj == NULL)
         goto error;
+
+      result = PyList_Append(list, obj);
+      Py_DECREF(obj);
+      if (result == -1)
+        goto error;
     }
   return list;
  error:
Index: subversion/bindings/swig/python/tests/mergeinfo.py
===================================================================
--- subversion/bindings/swig/python/tests/mergeinfo.py  (revision 1234786)
+++ subversion/bindings/swig/python/tests/mergeinfo.py  (working copy)
@@ -18,7 +18,7 @@
 # under the License.
 #
 #
-import unittest, os
+import unittest, os, sys, gc, itertools
 from svn import core, repos, fs
 import utils
 
@@ -29,6 +29,15 @@
     self.start = start
     self.end = end
 
+def get_svn_merge_range_t_objects():
+  """Returns a list 'svn_merge_range_t' objects being tracked by the
+     garbage collector, used for detecting memory leaks."""
+  return [
+    o for o in gc.get_objects()
+      if hasattr(o, '__class__') and
+        o.__class__.__name__ == 'svn_merge_range_t'
+  ]
+
 class SubversionMergeinfoTestCase(unittest.TestCase):
   """Test cases for mergeinfo"""
 
@@ -116,6 +125,54 @@
       }
     self.compare_mergeinfo_catalogs(mergeinfo, expected_mergeinfo)
 
+  def test_mergeinfo_leakage__incorrect_range_t_refcounts(self):
+    """Ensure that the ref counts on svn_merge_range_t objects returned by
+       svn_mergeinfo_parse() are correct."""
+    # When reference counting is working properly, each svn_merge_range_t in
+    # the returned mergeinfo will have a ref count of 1...
+    mergeinfo = core.svn_mergeinfo_parse(self.TEXT_MERGEINFO1)
+    for (path, rangelist) in mergeinfo.items():
+      # ....and now 2 (incref during iteration of rangelist)
+
+      for (r, i) in zip(rangelist, itertools.count()):
+        # ....and now 3 (incref during iteration of each range object)
+
+        refcount = sys.getrefcount(r)
+        # ....and finally, 4 (getrefcount() also increfs)
+        expected = 4
+
+        # Note: if path and index are not '/trunk' and 0 respectively, then
+        # only some of the range objects are leaking, which is, as far as
+        # leaks go, even more impressive.
+        self.assertEquals(refcount, expected, (
+          "Memory leak!  Expected a ref count of %d for svn_merge_range_t "
+          "object, but got %d instead (path: %s, index: %d).  Probable "
+          "cause: incorrect Py_INCREF/Py_DECREF usage in libsvn_swig_py/"
+          "swigutil_py.c." % (expected, refcount, path, i)))
+
+    del mergeinfo
+    gc.collect()
+
+  def test_mergeinfo_leakage__lingering_range_t_objects_after_del(self):
+    """Ensure that there are no svn_merge_range_t objects being tracked by
+       the garbage collector after we explicitly `del` the results returned
+       by svn_mergeinfo_parse().  We disable gc before the svn_mergeinfo_parse
+       call and force an explicit collection cycle straight after the `del`;
+       if our reference counts are correct, the allocated svn_merge_range_t
+       objects will be garbage collected and thus, not appear in the list of
+       objects returned by gc.get_objects()."""
+    gc.disable()
+    mergeinfo = core.svn_mergeinfo_parse(self.TEXT_MERGEINFO1)
+    del mergeinfo
+    gc.collect()
+    lingering = get_svn_merge_range_t_objects()
+    self.assertEquals(lingering, list(), (
+      "Memory leak!  Found lingering svn_merge_range_t objects left over from "
+      "our call to svn_mergeinfo_parse(), even though we explicitly deleted "
+      "the returned mergeinfo object.  Probable cause: incorrect Py_INCREF/"
+      "Py_DECREF usage in libsvn_swig_py/swigutil_py.c.  Lingering objects:\n"
+      "%s" % lingering))
+
   def inspect_mergeinfo_dict(self, mergeinfo, merge_source, nbr_rev_ranges):
     rangelist = mergeinfo.get(merge_source)
     self.inspect_rangelist_tuple(rangelist, nbr_rev_ranges)
]]
Fix a memory leak in convert_rangelist by *always* Py_DECREF'ing the object
returned by convert_to_swigtype (a SWIG wrapper around svn_merge_range_t)
once we've established it's not NULL.

This is required because PyList_Append takes ownership of references passed
in (i.e. calls Py_INCREF).  Prior to this fix, the reference counts of any
svn_merge_range_t objects would always be off-by-one, preventing them from
ever being garbage collected, having dire effects on the memory usage of
long-running programs calling svn_mergeinfo_parse() on real-world data.

Patch by: Trent Nelson <tr...@snakebite.org>
Tested on: FreeBSD, OS X, Windows.

* subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c:
  (convert_rangelist): Make sure we call Py_DECREF on the object returned
   from convert_to_swigtype.  PyList_New might return NULL; check for that.
   Add 'apr_array_header_t *' and 'swig_type_info *' casts where necessary
   to suppress compiler warnings.

* subversion/bindings/swig/python/tests/mergeinfo.py:
  (get_svn_merge_range_t_objects): New helper method that returns a list of
   any 'svn_merge_range_t' objects being tracked by the garbage collector,
   which we use to detect memory leaks.
  (SubversionMergeinfoTestCase): Add two new tests to aid in detecting memory
   leaks.  They essentially test the same thing in two different ways; if one
   fails, the other *should* fail.  (Of course, if one fails and the other does
   not -- that's just as valuable, diagnostically, and points to an issue with
   the 'automatic pool memory management' logic.)
    (test_mergeinfo_leakage__incorrect_range_t_refcounts): New test.  Verify
     svn_merge_range_t objects returned from svn_mergeinfo_parse() have the
     correct refcounts set.
    (test_mergeinfo_leakage__lingering_range_t_objects_after_del): New test.
     Verify that there are no lingering svn_merge_range_t objects after we
     explicitly delete the results returned from svn_mergeinfo_parse().
Index: subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
===================================================================
--- subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c        
(revision 1234786)
+++ subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c        
(working copy)
@@ -653,15 +653,27 @@
 static PyObject *convert_rangelist(void *value, void *ctx, PyObject *py_pool)
 {
   int i;
+  int result;
+  PyObject *obj;
   PyObject *list;
-  apr_array_header_t *array = value;
+  apr_array_header_t *array = (apr_array_header_t *)value;
 
   list = PyList_New(0);
+  if (list == NULL)
+    return NULL;
+
   for (i = 0; i < array->nelts; i++)
     {
       svn_merge_range_t *range = APR_ARRAY_IDX(array, i, svn_merge_range_t *);
-      if (PyList_Append(list, convert_to_swigtype(range, ctx, py_pool)) == -1)
+
+      obj = convert_to_swigtype(range, (swig_type_info *)ctx, py_pool);
+      if (obj == NULL)
         goto error;
+
+      result = PyList_Append(list, obj);
+      Py_DECREF(obj);
+      if (result == -1)
+        goto error;
     }
   return list;
  error:
Index: subversion/bindings/swig/python/tests/mergeinfo.py
===================================================================
--- subversion/bindings/swig/python/tests/mergeinfo.py  (revision 1234786)
+++ subversion/bindings/swig/python/tests/mergeinfo.py  (working copy)
@@ -18,7 +18,7 @@
 # under the License.
 #
 #
-import unittest, os
+import unittest, os, sys, gc, itertools
 from svn import core, repos, fs
 import utils
 
@@ -29,6 +29,15 @@
     self.start = start
     self.end = end
 
+def get_svn_merge_range_t_objects():
+  """Returns a list 'svn_merge_range_t' objects being tracked by the
+     garbage collector, used for detecting memory leaks."""
+  return [
+    o for o in gc.get_objects()
+      if hasattr(o, '__class__') and
+        o.__class__.__name__ == 'svn_merge_range_t'
+  ]
+
 class SubversionMergeinfoTestCase(unittest.TestCase):
   """Test cases for mergeinfo"""
 
@@ -116,6 +125,54 @@
       }
     self.compare_mergeinfo_catalogs(mergeinfo, expected_mergeinfo)
 
+  def test_mergeinfo_leakage__incorrect_range_t_refcounts(self):
+    """Ensure that the ref counts on svn_merge_range_t objects returned by
+       svn_mergeinfo_parse() are correct."""
+    # When reference counting is working properly, each svn_merge_range_t in
+    # the returned mergeinfo will have a ref count of 1...
+    mergeinfo = core.svn_mergeinfo_parse(self.TEXT_MERGEINFO1)
+    for (path, rangelist) in mergeinfo.items():
+      # ....and now 2 (incref during iteration of rangelist)
+
+      for (r, i) in zip(rangelist, itertools.count()):
+        # ....and now 3 (incref during iteration of each range object)
+
+        refcount = sys.getrefcount(r)
+        # ....and finally, 4 (getrefcount() also increfs)
+        expected = 4
+
+        # Note: if path and index are not '/trunk' and 0 respectively, then
+        # only some of the range objects are leaking, which is, as far as
+        # leaks go, even more impressive.
+        self.assertEquals(refcount, expected, (
+          "Memory leak!  Expected a ref count of %d for svn_merge_range_t "
+          "object, but got %d instead (path: %s, index: %d).  Probable "
+          "cause: incorrect Py_INCREF/Py_DECREF usage in libsvn_swig_py/"
+          "swigutil_py.c." % (expected, refcount, path, i)))
+
+    del mergeinfo
+    gc.collect()
+
+  def test_mergeinfo_leakage__lingering_range_t_objects_after_del(self):
+    """Ensure that there are no svn_merge_range_t objects being tracked by
+       the garbage collector after we explicitly `del` the results returned
+       by svn_mergeinfo_parse().  We disable gc before the svn_mergeinfo_parse
+       call and force an explicit collection cycle straight after the `del`;
+       if our reference counts are correct, the allocated svn_merge_range_t
+       objects will be garbage collected and thus, not appear in the list of
+       objects returned by gc.get_objects()."""
+    gc.disable()
+    mergeinfo = core.svn_mergeinfo_parse(self.TEXT_MERGEINFO1)
+    del mergeinfo
+    gc.collect()
+    lingering = get_svn_merge_range_t_objects()
+    self.assertEquals(lingering, list(), (
+      "Memory leak!  Found lingering svn_merge_range_t objects left over from "
+      "our call to svn_mergeinfo_parse(), even though we explicitly deleted "
+      "the returned mergeinfo object.  Probable cause: incorrect Py_INCREF/"
+      "Py_DECREF usage in libsvn_swig_py/swigutil_py.c.  Lingering objects:\n"
+      "%s" % lingering))
+
   def inspect_mergeinfo_dict(self, mergeinfo, merge_source, nbr_rev_ranges):
     rangelist = mergeinfo.get(merge_source)
     self.inspect_rangelist_tuple(rangelist, nbr_rev_ranges)

Reply via email to