Re: #4667, Merge uses large amount of memory

Julian Foad Wed, 04 Jan 2017 07:02:34 -0800

Stefan Fuhrmann wrote:

Julian Foad wrote:

https://issues.apache.org/jira/browse/SVN-4667

[...]


The branches involved have subtree mergeinfo on over 3500 files, each referring
to about 350 branches on average, and just over 1 revision range on average per
mergeinfo line. Average path length is under 100 bytes.


What is the result of 'svn pg "svn:mergeinfo" -R | wc -c'?


120 MB.

> [...]
> tools "svn-mergeinfo-normalizer" and "svn-clean-mergeinfo.pl" both also fail 
to
> execute in the available RAM.

You may run svn-mergeinfo-normalizer on arbitrary sub-trees.

Yes, and I may explore this further. I will note that we're alreadydealing with a subtree (the attempted merges and the mergeinfo reportedabove all refer to a subtree of the entire branch) as a whole-branchmerge had become impossible since some time ago.

A lot of memory will be used to hold that part of the repository
history that is relevant to the branches mentioned in the m/i.
This may easily grow to several GB if there have been tens of
millions of changes.


The number of revisions in the repository is about 1 million.

If the tool manages to read the mergeinfo, it will print m/i
stats before fetching the log.  Does it get to this stage?


I'll see if I can find out.

[...]

I would like to try a different approach. We read, parse and store all the
mergeinfo, whereas I believe our merge algorithm is only interested in the
mergeinfo that refers to one of exactly two branches ('source' and 'target') in
a typical merge. The algorithm never searches the 'graph' of merge ancestry
beyond those two branches. We should be able to read, parse and store only the
mergeinfo we need.


That seems to be the path to take.  I would have assumed that we only
need the m/i for the source branch as the target m/i is implied as
being all of the target history.

> Another possible approach could be to store subtree mergeinfo in a "delta" 
form
> relative to a parent path's mergeinfo.

I can see two problems here.  First, you can only use the new scheme
after all "relevant", i.e. merging, clients have been upgraded.

No, I meant just convert it to delta form when reading it into memory. Iwasn't proposing a format change of the stored svn:mergeinfo property.

More importantly, the in-memory data model would need to be something
delta-like.  That sounds like a lot of code-churn.


Sure, not trivial!

Thanks for the interest.

- Julian

Re: #4667, Merge uses large amount of memory

Reply via email to