#4667, Merge uses large amount of memory

Julian Foad Tue, 03 Jan 2017 06:59:28 -0800

https://issues.apache.org/jira/browse/SVN-4667

I am currently contracting for WANdisco to help a customer whose mergeis using excessive RAM. The merge will not complete with 4 GB RAM andwill complete with 5 GB RAM available.

The branches involved have subtree mergeinfo on over 3500 files, eachreferring to about 350 branches on average, and just over 1 revisionrange on average per mergeinfo line. Average path length is under 100 bytes.

This seems already far too much memory usage for the size of the dataset, and the size of the data set is growing.

Issue #4667 is about reducing the amount of RAM Subversion uses giventhis data set. Another way to approach the problem is to reduce theamount of subtree mergeinfo by changing the work flow practices; thatapproach is also being investigated but is not in the scope of thisissue, except to note that the tools "svn-mergeinfo-normalizer" and"svn-clean-mergeinfo.pl" both also fail to execute in the available RAM.

The reproduction recipe I'm using so far is attached to the issue. Itgenerates a repository with N=300 (for example) branches, each with aunique file changed, and merged to trunk such that trunk gets N fileswith subtree mergeinfo, each referring to up to N branches (half of N,on average).

I can then run test merges, with debugging prints in them, to view thememory increase:


# this runs a merge from trunk to branch,
# with WC directory 'A' switched to a branch:

$ (cdobj-dir/subversion/tests/cmdline/svn-test-work/working_copies/mergeinfo_tests-14/&& \

  svn revert -q -R A/ && \
  svn merge -q ^/A A)
DBG: merge.c:12587: using 8+3 MB; increase +2 MB
DBG: merge.c:12418: using 8+25 MB; increase +21 MB
DBG: merge.c:12455: using 8+34 MB; increase +9 MB
DBG: merge.c:9378: using 8+37 MB; increase +3 MB
DBG: merge.c:9378: using 8+43 MB; increase +6 MB

I don't know how representative this repro-test is of the customer's usecase, but it provides a starting point.

Monitoring the memory usage (RSS on Linux) of the 'svn' process (see theissue for code used), I find:


original: baseline 8 MB (after process started) + growth of 75 MB
after r1776742: baseline 8 MB + growth of 50 MB
after r1776788: baseline 8 MB + growth of 43 MB

Those two commits introduce subpools to discard temporary mergeinfoafter use. There are no doubt more possibilities to tighten the memoryusage using subpools. This approach might be very useful, but seemsunlikely to deliver an order-of-magnitude or an order-of-complexityreduction that probably will be needed.

I would like to try a different approach. We read, parse and store allthe mergeinfo, whereas I believe our merge algorithm is only interestedin the mergeinfo that refers to one of exactly two branches ('source'and 'target') in a typical merge. The algorithm never searches the'graph' of merge ancestry beyond those two branches. We should be ableto read, parse and store only the mergeinfo we need.

Another possible approach could be to store subtree mergeinfo in a"delta" form relative to a parent path's mergeinfo.


- Julian

#4667, Merge uses large amount of memory

Reply via email to