Stefan Fuhrmann wrote:
> On Thu, Jul 12, 2012 at 3:43 PM, Julian Foad wrote:
>> Stefan Fuhrmann wrote:
>>> From a process perspective, it seems much more prudent to
>>> do "qualified" merges like "merge from /trunk up to the last
>>> fully tested nightly build revision" and "merge from branch up
>>> to the point that I think is safe".
>>
>> Yes, that makes sense.
[...]
>> A = /trunk
>> Tested versions * *
>> A o---o---o---o---o---o---o---o---o---o---o---o---o
>> \ \____ /
>> \ ____\ __/
>> \ / \
>> B o---o---o---o---o---o---o---o---o---o---o
>> Safe versions * *
>> 1 2 3
>>
>> Sequence of events:
>> 1. Nightly testing.
>> 2. Catch up branch from latest stable trunk.
>> 3. Reintegrate branch to trunk, from latest tested version of branch.
>>
>> That sort of scenario?
>
> Exactly. The thing is that at this point the graph does
> not even depend on whether the changes to a given
> node actually produce a conflict. If the history processing
> code had any problems with this kind of merge graph,
> that problem would surface more often than actual conflicts.
Let's first clarify for other readers that we're not concerned with how the two
criss-crossing merges themselves work: each of those is a straightforward merge
and the fact that they cross over is not relevant to either of them. We might
want to build some sort of system for detecting this criss-crossing as it
happens, with the aim of alerting the user or preventing the second merge from
being committed, but that is not what we are talking about here. Rather, we
assume those two criss-crossing merges have already happened, and we are
talking now about how a third merge (afterwards) will behave.
Currently, the "symmetric" merge code treats a criss-cross merge by choosing
the later of the two possible common ancestors. In fact, this is just a
fall-out from choosing the youngest common ancestor: it doesn't currently fetch
enough information to notice that the latest merges in each direction cross
over.
The result of choosing one of the common ancestors will be a sub-optimal
merge. Typically, all needed changes are merged, but some unneeded changes are
merged too (changes that had already been made in the target branch, either
originally or through an earlier merge). Therefore the merge will, in general,
have "spurious" conflicts due to making the same change again. In simple cases
the duplicate changes may be auto-resolved: in particular, Subversion's
internal text merge auto-resolves duplicate hunks of text added to a file, and
Subversion auto-resolves duplicate added properties, but that is only going to
happen when such additions were clean additions.
There may be other ways in which the result is sub-optimal, such as failing to
report a conflict and just choosing one of the alternatives, where the user
really should have been given a conflict to resolve. Note also that
"sub-optimal" doesn't necessarily mean "just a bit worse that ideal", it can
mean "unusably bad" in some cases.
It so happens that in trying to make automated test scenarios for all possible
merges [1], I am running in to the issue of criss-cross merges. In preparation
for attempting a merge from A to B after a scenario such as this one:
A1 A2
---o---x---o------ mergeinfo = B:1
/ / \
\ / \
---o-------o---x-- mergeinfo = A:1-2
B1 B2
my latest attempt to fake the scenario does this:
A1 A2 AF where AF makes changes B1
---o-------o---o-- and sets mergeinfo = B:1
/
\
---o-------o---o-- where BF makes change A1,A2
B1 B2 BF and sets mergeinfo = A:1-2
but I have just realized (after seeing unexpected conflicts in the following
merge) that that actually represents a criss-cross merge:
A1 A2
---o-------o---x-- mergeinfo = B:1-2
/ ______\ _/
\ / \
---o-------o---x-- mergeinfo = A:1
B1 B2
The end points (the tips of the two branches) are in the same state as what I
was trying to achieve (the first diagram), but the difference is that the state
labaled as 'A2' does not have change 'B1' in it in this second case. The merge
code will use that state as the 'base' of the three-way merge, and thus the
result it produces will be different from the result produced by a merge after
the scenario set up in the first diagram. The state of A2 is the only relevant
difference between the non-criss-cross (first diagram) and criss-cross (third
diagram) scenarios; the mergeinfo on A2 will also differ and the states and
mergeinfo of some other nodes may also differ, but those other differences are
not noticed by the current algorithms ("symmetric" and 1.7 non-reintegrate,
that is; 1.7 reintegrate would look at different nodes).
As far as I have been able to discover, there is no generally agreed solution
to the problem. Most of the references I can find are, like [3], a commentary
on one or more of the attempts that have been made to find a better solution.
I found one summary [2] of how different merge algorithms attempt to handle the
criss-cross scenario. It is a bit too light on detail, but it does mention
some interesting solutions and their limitations.
At this point, I think the issue boils down to:
* Yes, the criss-cross situation can be produced in plausible real life usage.
* The current handling is sub-optimal, though not worse than we've had before.
* We might want to look at more sophisticated ways to handle it.
* We might want to look at ways to help users avoid the situation.
References:
[1] See the thread "Subtree mergeinfo -- what I learnt..." on this list.
[2] "CrissCrossMerge" on revctrl.org:
<http://revctrl.org/CrissCrossMerge?action=recall&rev=34>
(Sadly, the revctrl.org Wiki is full of spam; you have to look at old versions
of the pages to find the relevant content.)
[3] "Does Criss-Cross Merge Matter?" on vestasys.org:
<http://wiki.vestasys.org/MergingFuture/Food4Thought/TrickyCases#head-5ef046f558b5a8924506ec82d315e2ca7b9782a8>
- Julian
--
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download