Re: Two-phase merge

Stefan Fuhrmann Sun, 29 Apr 2012 11:23:06 -0700

Julian Foad wrote:

Stefan Fuhrmann wrote:

This one is basically the technical post promised in my
"Merge Policies" post. The proposal I make here is not
an alternative to e.g. "Symmetric Merge" but rather a
refined merge strategy that would also benefit the current
merge.

Today, the merge algorithm will do the following for a
merge from A ->  B:

* read / evaluate the merge info and change history
   to find the list of revisions to merge
* fragment the list of revisions where they are non-
   contiguous; fragment = a revision range to merge
* iteratively merge all fragments from A to B,
   optionally resolve conflicts after each fragment

In end, we will always merge individual file or property
changes but we plan the merge on tree-level. For many
file nodes, this will unnecessary intermediate states
where the merge got fragmented due to a change to
some *other* file.

Yes.  That is likely to be a significant cause of extra conflict in cases where the user 
has sub-tree mergeinfo or has done some cherry-pick merges and is now doing a "merge 
everything else" merge.


It also happens in the "back and forth" case:
if the changes in the source are interspersed
with merges from the target, i.e. the merges
from A to B will be excluded from the changes
to be merged from B to A.

This increases the probability of
merge conflicts and their likelihood to show up again
after each conflict resolution.

(That is: after each phase of the N revision-range phases that the merge got 
broken into.)

Moreover, it masks cases
where Symmetric Merge might select an optimized
merge plan (see wiki for various examples).

I understand, but I don't know of any examples on the Wiki yet.


E.g. in a merge from A to B, we can simply take the merge
result of the last B to A merge if there were no later changes
on either side and the B -> A merge did not cherry-pick
(with respect to the node in question).

I propose to use the following two-phase merge workflow:

(1) Planning phase
- read / evaluate the merge info and change history
   to find the list of revisions to merge
- break that down for each node changed on the source side
- optionally handle renaming etc.
- mark tree conflicts (not sure when to signal them to the
   user and when to resolve them; high-level fragmentation
   might become necessary)

(2) Execution phase
- merge changes on a per-node basis, i.e. all changes of
   that node get merged before the next node
- fragmentation may still be necessary for *that file*
- given the "merge plan" for a node, alternative merge
   strategies may be chosen

OK, the significant change here is not so much the two-phase (although that has 
merit in itself) but rather turning the processing order inside-out from 
(conceptually)

   REV_RANGES = [contiguous-revision-ranges needed for TREE]
   for REV_RANGE in REV_RANGES:
     3-way merge REV_RANGE into the whole TREE

to (conceptually)

   for each NODE in the TREE:
     MERGES = [find the best sequence of 3-way merges to merge all needed 
changes into NODE]
     for MERGE in MERGES:
       3-way-merge MERGE into NODE


Correctly. With the tiny addition that all rename
and tree conflict handling is done in the planning
phase. That makes it repeatable and potentially
reduces the number of conflicts (at most one per
node instead of one per node and merge step).

The second phase is also reduced to purely text-
based processing because all the "structure"
issues have already been resolved. But that might
be more of a mental aid instead of a true reduction
in complexity (we still might need to handle missing
targets on disk etc.).

This is a good idea in itself.  It certainly brings advantages as mentioned 
above.  I'm not sure what, if any, disadvantages it might have.


Memory consumption could be proportional to
the number of changes and renames or such.
Should not be a real-world problem as long as
you don't merge more than a few 100k changes.

A more severe issue might be latency because
the planning phase needs to gather / evaluate
data from large sections of history. But that one
can be improved ;)

I don't know that the scenarios this helps with are our main concern at this 
point.  It would be good to keep this approach in mind when writing or 
designing any new merge code.  In terms of development effort I think it's more 
valuable to concentrate first on the advantages to to-and-fro merging (most 
commonly without subtree merges and without cherry-picks) that the 'symmetric 
merge' brings.


I think it can become helpful as soon as we
track renames because we need to remap
on a per-node basis.

-- Stefan^2.

Re: Two-phase merge

Reply via email to