[Subversion Wiki] Update of "MoveDev/MoveDev" by JulianFoad

Apache subversion Wiki Tue, 03 Sep 2013 09:32:50 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Subversion Wiki" for 
change notification.


The "MoveDev/MoveDev" page has been changed by JulianFoad:
https://wiki.apache.org/subversion/MoveDev/MoveDev?action=diff&rev1=9&rev2=10

Comment:
Update, especially the Move Semantics section

  = How to Add Moves to Svn =
+ <<TableOfContents(3)>>
+ 
  == Summary ==
  Subversion needs to handle moves and renames better than in version 1.8.  
This paper presents the rationale and a plan for doing so.
  
@@ -8, +10 @@

  
  We will preserve backward compatibility between move-aware clients and 
move-unaware repositories, and between move-aware repositories and move-unaware 
clients.  In all cases, simple compatibility will be available by falling back 
to copy-and-delete.  In some cases, heuristic detection of moves may be offered 
as an option.
  
- == Introduction ==
- ...
+ === Move vs. Rename ===
+ We say “move” or “rename” interchangeably for most purposes.  Their essential 
similarities include the concept of a preserved node identity.  It can be 
useful sometimes to draw a distinction.  When merging a rename-only (A/foo → 
A/bar) with a move-only (A/foo → B/foo) we can suggest that the most likely 
merge resolution would be to apply both the move and the rename (→ B/bar).
  
  == Why Not Just Copy and Delete? ==
  I believe we need explicit move semantics, although it is difficult to 
succinctly explain why.
@@ -26, +28 @@

  
  Some previous ideas about how much we need to track moves explicitly:
  
-  * We need to track moves in the server so we   can do forward history 
tracing.  But does this really help unless       the clients are aware and able 
to communicate these moves to and        from the server?
+  * We need to track moves in the server so we can do forward    history 
tracing.  But does this really help unless the clients are      aware and able 
to communicate these moves to and from the server?
-  * Track moves in the WC only – as implemented  in Subversion 1.8.  This 
helps with certain situations: it can apply    incoming edits into a locally 
moved node, and it can prevent the        accidental committing of just one 
half of a move.
+  * Track moves in the WC only – as implemented in Subversion    1.8.  This 
helps with certain situations: it can apply incoming         edits into a 
locally moved node, and it can prevent the accidental      committing of just 
one half of a move.
-  * We don't need to track moves explicitly, as  we can do everything we 
really need by recognizing copy-and-delete      as a move, and that has the 
advantage of not changing the network       protocols and so on.
+  * We don't need to track moves explicitly, as we can do        everything we 
really need by recognizing copy-and-delete as a move,     and that has the 
advantage of not changing the network protocols and    so on.
-  * We don't need to track moves explicitly, as  we can do  everything we 
really need by always treating a copy in       the same way as the copy half of 
a move.  So, when merging, if there    is a copy of the node being merged, then 
all changes destined for       the copy-source node should go to the copy-dest 
node as well – or       instead, if the copy-source node is deleted.  In this 
way, the  semantics of copy and move are unified.
+  * We don't need to track moves explicitly, as we can do        everything we 
really need by always treating a copy in the same way     as the copy half of a 
move.  So, when merging, if there is a copy of    the node being merged, then 
all changes destined for the copy-source    node should go to the copy-dest 
node as well – or instead, if the       copy-source node is deleted.  In this 
way, the semantics of copy and    move are unified.
  
  Arguments against treating any copy in the same way as the copy half of a 
move:
  
-  * Why should we treat a single copy (cp A A2;  rm A) differently from the 
same situation plus an additional copy       (cp A A2; cp A A3; rm A)?  And if 
we decided to merge into all the      copies alike, then why should we only do 
so when there is a delete?
+  * Why should we treat a single copy (cp A A2; rm A) differently        from 
the same situation plus an additional copy (cp A A2; cp A A3;      rm A)?  And 
if we decided to merge into all the copies alike, then      why should we only 
do so when there is a delete?
  
   * ...
  
  === Combining Changes ===
  The problems with copy-and-delete boil down to various kinds of ambiguity, 
inconsistency or non-determinism.  Many of these are related to the problem of 
representing a sequence of changes as a single change.  It is fundamental in a 
version control system to be able to update, merge or diff between two widely 
separated revisions without having to step through all the intermediate 
revisions in sequence, and so it is necessary to have an unambiguous way of 
combining successive changes.  If we attempt to interpret copy-and-delete as a 
move, that leads to ambiguous or context-dependent results when combining 
changes.
  
-  * Spatial ambiguity.  When looking at a subtree        that contains only 
one half of the move, we would see a copy or a       delete, but if we then 
look at a wider subtree we would see a move.     Look wider again, and we may 
see a second copy from the same source,    which means there is no move because 
there is no unique copy.
+  * Spatial ambiguity.  When looking at a subtree that contains  only one half 
of the move, we would see a copy or a delete, but if      we then look at a 
wider subtree we would see a move.  Look wider        again, and we may see a 
second copy from the same source, which         means there is no move because 
there is no unique copy.
-  * Ambiguity when the copy-from is not the      revision immediately before 
the copy-and-delete.  If the deleted        node was modified between the 
copy-from revision and the delete,        then is it still a move?  No, because 
it has a forked history.  If      we treat it as a move only if the delete side 
was not modified since    the copy-from revision, then there is a race on 
commit because the      change that gets committed will be seen as a move if 
nobody else        modifies it in the meantime, or as a non-move if somebody 
gets in       first.  If we want the semantics of a move, we have to tell the   
      server it is a move so it can avoid this.
+  * Ambiguity when the copy-from is not the revision immediately         
before the copy-and-delete.  If the deleted node was modified   between the 
copy-from revision and the delete, then is it still a       move?  No, because 
it has a forked history.  If we treat it as a        move only if the delete 
side was not modified since the copy-from       revision, then there is a race 
on commit because the change that        gets committed will be seen as a move 
if nobody else modifies it in     the meantime, or as a non-move if somebody 
gets in first.  If we        want the semantics of a move, we have to tell the 
server it is a        move so it can avoid this.
-  * Ambiguity when the delete is not in the same         revision as the copy.
+  * Ambiguity when the delete is not in the same revision as the         copy.
-  * Temporal ambiguity.  Difficulty in composing         a series of changes 
(revisions) together.  If we start with (cp A       A2; rm A), that looks like 
a move, but if we then commit (cp    A@orig-rev A3) and look at the overall 
combined change, we now see a    multiple-copies scenario.  Conversely, if we 
start with (cp A A2; cp    A A3; rm A) and then commit (rm A2), we change a 
non-move into a        move.
+  * Temporal ambiguity.  Difficulty in composing a series of     changes 
(revisions) together.  If we start with (cp A A2; rm A),        that looks like 
a move, but if we then commit (cp A@orig-rev A3) and    look at the overall 
combined change, we now see a multiple-copies       scenario.  Conversely, if 
we start with (cp A A2; cp A A3; rm A) and    then commit (rm A2), we change a 
non-move into a move.
  
  In one context, a certain copy and delete can be paired uniquely and thus 
interpreted as a move, while in another context the same copy and delete are 
not unique or are not both visible.
  
+ == Move Semantics ==
+ This section specifies the logical semantics of the versioned move operation 
that is the basis of move tracking, independent of any implementation.
+ 
+ === Node-Line-Id ===
+ Assume that each versioned path in a revision has an identifier that we will 
call its ''node-line-id''.  The node-line-id need not physically exist: it is a 
concept used in the ''definition'' of moves but not necessarily in the 
''implementation''.
+ 
+ The node-line-id is preserved when the content of a path is modified.  A new 
node-line-id is assigned to every new node that is created by addition or by 
copying, including when it replaces a previous node at the same path.  This new 
node-line-id is unique within the whole repository.  Within any given revision, 
each node-line-id is unique among all the paths.
+ 
+ The node-line-id is similar to the (node-id, copy-id) tuple in the existing 
Subversion filesystems, except that the lazy-copy mechanism does not assign a 
new copy-id to a child of a copy until that child (or one of its descendants) 
is modified.  Therefore an unmodified child of a copy has the same (node-id, 
copy-id) as the corresponding child path of the copy source, whereas (by 
definition) it has a new node-line-id.
+ 
+ Let the term ''node-line'' refer to the set of PATH@REV locations that have a 
given node-line-id.
+ 
+ === Definition of Move ===
+ Given a node-line N and two revisions rX and rY (X < Y), the definition of N 
being moved in rY with respect to rX is:
+ 
+  * '''Same ''''''node-line''''''.'''  A node-line with  node-line-id N exists 
at path P,,N,X,,  in rX and at path P,,N,Y,,      in rY.  It is the same 
node-line, and so has the same node      kind.  Its content may differ.
+  * '''Move and/or rename.'''  The node-line N has either or     both of
+   * a different name (base name) in rX than in rY; and/or
+   * a different parent (parent directory node-line-id) in rX            than 
in rY.
+   * Thus, the paths P,,N,X,,            and P,,N,,,,,Y,,                
typically differ but may                be the same.
+  * '''No gap.'''  There cannot be a gap in the range of         revisions: 
node-line-id N exists in every revision rX, rX+1, …,         rY-1, rY.
+   * Contrast with copy-and-delete, where there can be a gap             
between the delete and the copy.
+   * The possible “resurrection” extension to the move           semantics 
would permit a gap.
+  * '''Children follow.'''  If N is a directory, each child      (recursively) 
of N in rX remains a child of N in rY, with the same      name, unless it is 
separately moved or deleted.  Any or all of the      children can be separately 
moved within or outside the subtree at N,    at the same time as N is moved.
+ 
+ === Properties of Move ===
+ Some properties of the move relationship are:
+ 
+  * '''Unique.'''        move(A@X,B@Y) and move(A@X,C@Y) cannot both be true.
+  * '''Transitive.'''    move(A@X,B@Y) followed by move(B@Y,C@Z) collapses to 
move(A@X,C@Z).
+  * '''Reversible.'''    move(A@X,B@Y) followed by move(B@Y,A@Z) collapses to 
no-move.
+  * '''Time-symmetric.'''        move(A@X,B@Y) is symmetric with the 
time-reversed relationship  move(B@Y,A@X).
+  * '''No null move.'''  An attempted move which does not change         the 
node's name or its parent node, with or without a modification,     is not 
distinguished from a normal succession of history.
+ 
+ In the notation A@X, A represents a parent directory node-line-id and child 
name (rather than a full path), X represents a revision number, and A != B, and 
X != Y, etc.
+ 
- === Move vs. Rename ===
+ === Move versus Rename ===
- We say “move” or “rename” interchangeably for most purposes.  Their essential 
similarities include the concept of a preserved node identity.  It can be 
useful sometimes to draw a distinction.  When merging a rename-only (A/foo → 
A/bar) with a move-only (A/foo → B/foo) we can suggest that the most likely 
merge resolution would be to apply both the move and the rename (→ B/bar).
+ In the versioned data model semantics, “move” refers to a change of parent 
node and/or a change of name.
  
+ At a higher level of semantics, for example when resolving conflicts during 
merge, it can be useful to distinguish between renaming and moving to a 
different parent node.
+ 
+ === Can't Move a Child of a Copy ===
+ Moving a child of a copy, within the same revision, is not tracked: it is an 
''unversioned'' operation.
+ 
+ A versioned "move" takes a node that existed in the previous revision and 
places it in a new location.  A copy, however, always creates ''new'' nodes, 
conceptually, even if the internal representation is a "lazy copy" pointer to 
the old node.  Moving a child node therefore is a rearrangement of the new 
content.  It is semantically the same as deleting the child node and creating a 
copy of it somewhere else.  Compare with copying a node and then moving that 
copy somewhere else.
+ 
+ If we perform a copy and then move a child of it, either in a WC or in a 
repository, this should create a copy with a deleted child, and then another 
copy somewhere else which is the "moved" child in its new location.  We can 
describe the relationship between the initial and final states perfectly well 
without saying "move", in the form "the subtree at path P@X is copied to path 
P1@Y, except for its child C which is copied to path P2/C@Y instead".  Thus 
there is no loss of semantic information despite the absence of “move” in the 
result.
+ 
+ === Resurrecting a Deleted Node ===
+ A possible extension to the move semantics would be to allow a previously 
deleted node to be resurrected at the same location or at a different location.
+ 
+ The mental model is this.  When a node is deleted, it is unlinked from the 
versioned tree, but its content continues to exist in the repository.  When the 
node is resurrected, a link to that last version of its content is put into the 
versioned tree, like undoing the delete.  The new link can be made anywhere in 
the versioned tree, like undoing the delete and moving the node at the same 
time.
+ 
+ This is not merely a blue-sky curiosity: it may necessary in order to ensure 
logical completeness.  For example, let's say the node initially at path A/foo 
is moved to B/foo and then back to A/foo.  If we create a branch C from A, and 
are continuously merging all changes under A into C, then C/foo will be deleted 
and will later be recreated.  Or if 'svnsync' is replicating only subtree 'A' 
of repository R1 into repository R2, then in repository R2 we will see A/foo 
disappear and then reappear.  In cases such as these, the 'reappearance' should 
be modeled as a resurrection; if it is modeled as a plain add or as a copy, it 
will not have the correct semantics of being the 'same' node that it was before.
+ 
+ ----
  == System Overview ==
  Move support can be added in phases.  The “core components”, outlined in 
yellow in the following diagrams. must be upgraded to get a basic level of 
support in which commits and updates support moves.  The other components, 
including merge, can be supported later.
  
@@ -194, +249 @@

  === Repo Format Bump ===
  It is essential that the repository filesystem 'knows' whether move semantics 
are enabled, because copy-and-delete must then no longer be interpreted 
heuristically as a move.  This could be indicated by bumping the FS format 
number, if it applies to the whole repository, or potentially we could mark 
that all revisions after a certain point have move semantics enabled whereas 
prior revisions don't.
  
- = Move Semantics =
- This section specifies the logical semantics of the versioned move operation 
that is the basis of move tracking, independent of any implementation.
- 
- A versioned move of the node with node-copy-id “N”, with respect to two 
revisions rX and rY (X < Y), shall mean:
- 
-  * '''Same node-copy-id.'''  A node with        node-copy-id N exists in rX 
and in rY.  It is “the same node”.          It therefore has the same node 
kind.  It may have content       modifications.
-  * '''No gap.'''  There cannot be a gap in the  range of revisions: 
node-copy-id N exists in every revision rX,         rX+1, …, rY-1, rY.
-   * (Contrast with copy-and-delete, where there                 can be a gap 
between the delete and the copy.)
-   * (The possible “resurrection” extension              to these semantics 
would permit a gap.)
-  * '''Move and/or rename.'''  Node N has either         or both of
-   * a different name (base name) in rX than in          rY; and/or
-   * a different parent (parent directory                node-copy-id) in rX 
than in rY.
-  * '''Children follow.'''  If N is a directory,         each child 
(recursively) of N in rX remains a child of N in rY, with    the same name, 
unless it is separately moved or deleted.  Any or all    of the children can be 
separately moved within or outside the   subtree at N, at the same time as N is 
moved.
-  * '''No null move.'''  An attempted move which         does not change the 
node's name or its parent node, with or without     a modification, is not 
distinguished from a normal succession of        history.
- 
- === Properties ===
- In the notation A@X, A represents a parent directory node-copy-id and child 
name, while X represents a revision number.
- 
- Properties of the move relationship:
- 
-  * '''Transitive.'''     move(A@X,B@Y) followed by move(B@Y,C@Z) collapses to 
move(A@X,C@Z).
-  * '''Reversible.'''     move(A@X,B@Y) followed by move(B@Y,A@Z) collapses to 
no-move.
-  * '''Time-symmetric.'''         move(A@X,B@Y) is symmetric with the 
time-reversed relationship         move(B@Y,A@X).
- 
- === Move versus Rename ===
- In the versioned data model semantics, “move” refers to a change of parent 
node and/or a change of name.
- 
- At a higher level of semantics, for example when resolving conflicts during 
merge, it can be useful to distinguish between renaming and moving to a 
different parent node.
- 
- === Can't Move a Child of a Copy ===
- Moving a child of a copy, within the same revision, is not tracked: it is an 
''unversioned'' operation.
- 
- A versioned "move" takes a node that existed in the previous revision and 
places it in a new location.  A copy, however, always creates ''new'' nodes, 
conceptually, even if the internal representation is a "lazy copy" pointer to 
the old node.  Moving a child node therefore is a rearrangement of the new 
content.  It is semantically the same as deleting the child node and creating a 
copy of it somewhere else.  Compare with copying a node and then moving that 
copy somewhere else.
- 
- If we perform a copy and then move a child of it, either in a WC or in a 
repository, this should create a copy with a deleted child, and then another 
copy somewhere else which is the "moved" child in its new location.  We can 
describe the relationship between the initial and final states perfectly well 
without saying "move", in the form "the subtree at path P@X is copied to path 
P1@Y, except for its child C which is copied to path P2/C@Y instead".  Thus 
there is no loss of semantic information despite the absence of “move” in the 
result.
- 
- === Extension: Resurrecting a Deleted Node ===
- A possible extension to the move semantics would be to allow a previously 
deleted node to be resurrected at the same location or at a different location.
-

[Subversion Wiki] Update of "MoveDev/MoveDev" by JulianFoad

Reply via email to