[Subversion Wiki] Update of "MergeLimits" by StefanFuhrmann

Apache subversion Wiki Mon, 09 Apr 2012 13:04:59 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Subversion Wiki" for 
change notification.


The "MergeLimits" page has been changed by StefanFuhrmann:
http://wiki.apache.org/subversion/MergeLimits

Comment:
Initial page content

New page:
== Preface ==

This page is not meant as a fundamental critique
of Subversion's merge implementation but rather
tries to approach the problem from a radically
different angle. Many of the algorithms used today
may turn out to be quite reasonable albeit derived
from a simplified model.

Moreover, the current content of this page is more
of an unsorted brain dump that will need further
structuring. It may make attempts on addressing
individual issues but will not present a coherent,
formal model. More than anything, this aims at
creating a deeper insight into the nature of the
problem.


== Core Issue: Impedance Mismatch ==

Content vs. container based operations.


=== User's view on merge ===

The user changes some text and wants to merge that
change to some other development line.

To be precise, the user ''changes'' the ''content'' of
some ''document'' and wants the tool to make an
''equivalent'' change to some ''similar or related''
document.
[
Footnote: Subversion's model of "a branch is just
a copy" is fully consistent with that model. If there
has been a reason to modify some text section, it is
likely that the same reasoning will apply to any copy
of that section. If Subversion supported additional
semantic on copies (split, join, tag, one-way, ...),
copies could be a more powerful concept than just
branches.
]
The problem here is that most tools have a very
limited understanding of each of the 5 highlighted
points.

'''Change'''. More than just a diff, a change has a ''scope''
and an ''intend''. That intend will then result in a
diff. Most tools don't understand intends like
"replace all occurrences of X with Y" or "move block
B to position P" etc. In many cases, it should be
possible to deduce the intent and to represent the
change as a set of operations of pre-defined types.
This is also linked to the next item:

'''Content'''. As opposed to ''structure'' (e.g. textual order,
file mapping) and ''formatting'' (e.g. white spaces),
only this subset of a document's byte stream is
relevant to a document's formal validity. The other
two aspects are usually of lesser importance.
A generic tool may or may not be able to differentiate
between these three aspects. When it can, several
classes of conflicts can be resolved automatically.

'''Document'''. A conceptual unit defined by the user's
application. It is often not identical with a file
for set of files. Files and folders are containers
that documents get mapped to. For instance, the
source code of a C library may be a document. Its
mapping to .c and .h files is mainly convention and
may change over time but that does not affect the
document's content. A merge tool must operate on
documents, not individual files.

'''Equivalent'''. Having the same effect on the target
document than on the source. Some changes may not
be necessary (e.g. the affected code section does
no longer exist). Others may be extended (e.g.
replacing an member identifier within a function
that got many lines added on the target side).
In any case, the changes must be translated to
the target.

'''Similar or Related'''. Merging between related documents
is simpler because the full change history of both
sides is known. An ideal tool that understood the
intend of a change should be able to apply it just
as well to any sufficiently similar document without
requiring any relationship with the source.


=== Issues with Subversion's approach ===

Subversions is an excellent version control system
and a solid basis for configuration management.
However, it only manages the versions of data containers
(files and folders) with virtually no understanding
of their content. That limits its merge capabilities
to container-level operations. This is where the
impedance mismatch lies: using container-level
operations to implement content-level use-cases.

Moreover, merge tracking information is stored with
the (resulting) data. Instead, it should be an attribute
of the change, i.e. should be stored with the revision
as detail information to the various changes in that
revision.


=== Other tools ===

[based on hear-say, details may be wrong]

GIT accidentally got the document vs. file part less
wrong by loosely identifying files via content rather
than name, i.e. the actual file change history is of
lesser importance.

ClearCase (and to some degree GIT) will merge complete
branch histories and on branch level, i.e. it always
merges the "whole document" and can resolve structural
changes easier with a lower risk of creating conflicts
in the future. E.g. moves cannot be merged partially.


== Practical conclusions ==

Even without an in-depth analysis and attempt on modeling 
a perfect merge scheme,


=== A use-case ===

Subversion should allow for large-scale refactorings to
be performed on some branch and then be merged successfully
to other branches and the main development line. None of
that shall unduly disrupt the anybodies development.


=== Annotated copies and deletions (document structure) ===

Introduce the concept of ''split'' and ''join'' for files and
folders. In the first case, changes must be promoted to
exactly one of the copy targets. Similarly, changes to
any of the sources of a join will be applied to the its
target.

These are typical operations when refactoring a data model
(classes, modules etc.) and generalizes the concept of
rename tracking.

It may also be useful to combine ''split'' and ''join'' in a
single operation, e.g. 3 -> 2 files.

By extension, text blocks moved from one file to another
should be detected.


=== Branch and merge directions ===

Copies should have two boolean flags: ''merge-from-source''
and ''merge-to-source'', both being set by default. 
They specify the default change
flow. For instance, a stabilization / release branch would
only have "merge-from-source" set while tags would not set
any of the flags.

With that, users can "pull in" any outstanding changes
from all branches (or push changes to branches). That is
an interesting feature for GUI clients.

Attempts to merge without the respective flag being set
will require a "--force" parameter.


=== Change hierarchy ===

Separate the change information into

* structural changes (text moves, splits, joins etc.)
* textual changes
* whitespace changes (indentation, line breaks)

Use the first to translate change positions, then apply
textual changes and finally whitespace changes.

Conflicts will be resolved in the same order with the
respective next step being adjusted to the output of the
previous one. E.g. if the indentation of 4 lines on the
target side got changed but the incoming text change
replaces them with 3 lines, the result will change the
indentation of those 3 lines - without creating a conflict.


== On the non-importance on perfect automatic merges ==

=== Status quo ===

Despite the advertisement, SVN in its default configuration
does not guarantee the consistency of a set of files after
a commit. Disjoint sub-sets of files may be modified and
committed concurrently without any consistency check on
the whole file set.

This is a reasonable trade-off between workflow restrictions
imposed by the tool itself and those defined my the development
team / process. Most organizations will use automatic builds
and tests to verify that the repository content is consistent.
As long a build breakage is transparent and infrequent,
the overall productivity is much higher than with an enforced,
fully serialized update - build & test - commit cycle.


=== Potential for improved usability ===

Subversion should be able to reconcile more changes on the 
server, i.e. without forcing the user to update.

Directory property changes, for instance, should be accepted
without a full tree update as long as there was no other change
to those props. That will improve the merge tracking user
experience in larger projects.


=== Impact on merge ===

Automatic merge may be more aggressive on resolving conflicts
as long as "questionable" decisions are being documented by
e.g. warnings. Most failed merges / merge artifacts will
manifest in either build or test failures. 

In case of an undetected merge-induced problem, it will be
hard to distinguish that from similar problems caused by
concurrent changes on the same development line. So, even
aggressive merge conflict resolution strategies don't
create the need for an extra QA because the same is already
needed due to the fact of concurrent development and its
integration needs.


== Semantic ambiguities  ==

TBD. Keywords:

 * Merge order
 * re-applying changes
 * affect on future merges from other sources
 * Maybe solution: Content = sum of its changes.

[Subversion Wiki] Update of "MergeLimits" by StefanFuhrmann

Reply via email to