Re: [PATCH 3 of 3] rebase: use matcher to optimize manifestmerge

Durham Goode Mon, 20 Mar 2017 17:15:04 -0700

On 3/20/17 1:14 AM, Yuya Nishihara wrote:

On Sun, 19 Mar 2017 12:00:58 -0700, Durham Goode wrote:

# HG changeset patch
# User Durham Goode <dur...@fb.com>
# Date 1489949694 25200
#      Sun Mar 19 11:54:54 2017 -0700
# Node ID 800c452bf1a44f9f817174c69443121f4ed4c3b8
# Parent  d598e42fa629195ecf43f438b71603df9fb66d6d
rebase: use matcher to optimize manifestmerge


The old merge code would call manifestmerge and calculate the complete diff
between the source to the destination. In many cases, like rebase, the vast
majority of differences between the source and destination are irrelevant
because they are differences between the destination and the common ancestor
only, and therefore don't affect the merge. Since most actions are 'keep', all
the effort to compute them is wasted.

Instead, let's compute the difference between the source and the common ancestor
and only perform the diff of those files against the merge destination. When
using treemanifest, this lets us avoid loading almost the entire tree when
rebasing from a very old ancestor. This speeds up rebase of an old stack of 27
commits by 20x.


Looks generally good to me, but this needs more eyes.

@@ -819,6 +819,27 @@ def manifestmerge(repo, wctx, p2, pa, br
         if any(wctx.sub(s).dirty() for s in wctx.substate):
             m1['.hgsubstate'] = modifiednodeid

+    # Don't use m2-vs-ma optimization if:
+    # - ma is the same as m1 or m2, which we're just going to diff again later
+    # - The matcher is set already, so we can't override it
+    # - The caller specifically asks for a full diff, which is useful during 
bid
+    #   merge.
+    if (pa not in ([wctx, p2] + wctx.parents()) and
+        matcher is None and not forcefulldiff):


Is this optimization better for normal merge where m2 might be far from m1?

I'm not sure what you mean by 'normal merge'. You mean like an 'hgmerge'? Or like an hg update? Any merge where you are merging in asmall branch will benefit from this (like hg up @ && hg mergemyfeaturebranch). Linear hg updates (where m1/m2 areancestors/descendants) won't benefit because the m2-vs-ma optimizationis the same m2-vs-m1 diff we're doing normally.

+        # Identify which files are relevant to the merge, so we can limit the
+        # total m1-vs-m2 diff to just those files. This has significant
+        # performance benefits in large repositories.
+        relevantfiles = set(ma.diff(m2).keys())
+
+        # For copied and moved files, we need to add the source file too.
+        for copykey, copyvalue in copy.iteritems():
+            if copyvalue in relevantfiles:
+                relevantfiles.add(copykey)
+        for movedirkey in movewithdir.iterkeys():
+            relevantfiles.add(movedirkey)
+        matcher = matchmod.match(repo.root, '',
+                                 ('path:%s' % p for p in relevantfiles))


Perhaps we can use scmutil.matchfiles(). patterns shouldn't be a generator
since it may be evaluated as a boolean.

That sounds like a good idea. Will put that in V2 after everyone hashad a say on V1.

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH 3 of 3] rebase: use matcher to optimize manifestmerge

Reply via email to