Merge-relevant information that is hard to come by

Stefan Fuhrmann Tue, 24 Apr 2012 13:21:41 -0700

Hi all,

This post is meant to document two pieces of information
that SVN does not track explicitly and how that effects
SVN's merge logic. I'm not an expert on the merge code
as we use it in SVN right now, so feel free to correct me
if some of the following is already being addressed.


Also, I'm not advocating a specific change but want to see
whether people agree with my points and how they assess
their severity / importance.

(1) Renamed / moved nodes.
Everyone is probably aware that we don't record these
nodes other than as disappearing in one place and re-
appearing at some other place.

The thing is that even *with* move support in the back-end,
we could not rely on the mv command being used accurately
and consistently by our users. False positives might be less
common but there will be plenty of false negatives. Speaking
from my experience, from time to time I do add files that later
turn out to be replacements for other files that will be removed
soon after.

Other users may simply not care / know about the implications
of copy-with-history and mv but naturally move files on disk
and let the client UI DTRT. And there is lots of gray area where
semantics don't match well, e.g. file X superseding file Y but
can't and should not be merged. Still, the user wants to record
their semantic relationship ...

I'm all for recording moves as such but we will always have
to have a way identify moves after the respective changes
have already been committed. The good news is that >90%
of these cases can be identified programmatically it is just
somewhat expensive to do.

(2) Modified merges.
In case of textual conflicts, users will usually resolve them
before committing the merge result. Depending on policies,
a user may even need to modify textually successful merges
to e.g. fix a broken build before the merge may be committed.

The problem is that SVN will not record the difference between
the merge result and the combined change that eventually
get committed. This information is not lost because the system
could redo the merge and diff the result against the committed
change. But it is expensive to get that information.

To me, this has been even more annoying than tree-conflicts
due its consequences. SVN pretends that the commit contained
just the result of automatic merge and the user's input is being
ignored. If you try to merge the result back to the initial merge's
source, you are likely to get the same conflicts that have
already been resolved because that very resolution will not be
included in the merge.

From discussions with Julian, it seems that Symmetric Merge
could already do better while still "ignoring" the user's change.

I realize that a proper fix to this situation is difficult because
a single merge request may be broken down into multiple
ranges being merged sequentially. After each of them, there
may be a conflict being resolved manually etc. OTOH, there
is great potential in improving our merge performance
because differentiating between merge result and manual
intervention increases the amount of information available
to automatic conflict resolution / prevention.

-- Stefan^2.

Merge-relevant information that is hard to come by

Reply via email to