[Subversion Wiki] Update of "MoveDev/MoveDev" by JulianFoad

Apache subversion Wiki Thu, 20 Jun 2013 13:33:37 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Subversion Wiki" for 
change notification.


The "MoveDev/MoveDev" page has been changed by JulianFoad:
https://wiki.apache.org/subversion/MoveDev/MoveDev?action=diff&rev1=3&rev2=4

+ 
+ 
  = How to Add Moves to Svn =
- (A System Overview)
  
- Move support can be added in phases.  The “core components” must be upgraded 
to get a basic level of support in which commits and updates support moves, and 
the infrastructure required.  The “optional components” can be supported later, 
and include merge.
+ == Summary ==
+ Subversion needs to handle moves and renames better than in version 1.8.  
This paper presents the rationale and a plan for doing so.
  
- Core components:
+ To achieve the behaviour that users expect, it is necessary to distinguish a 
move from a copy-and-delete, and thus necessary to track moves explicitly.  We 
therefore introduce moves into Subversion as a new semantic operation which 
preserves node identity and is distinct from copy-and-delete.
  
+ We will preserve backward compatibility between move-aware clients and 
move-unaware repositories, and between move-aware repositories and move-unaware 
clients.  In all cases, simple compatibility will be available by falling back 
to copy-and-delete.  In some cases, heuristic detection of moves may be offered 
as an option.
-  * WC up/sw editor
-  * Client/WC commit edit-driver
-  * RA diff/up/sw/st edit-driver
-  * RA commit editor
-  * RA-serf protocol; RA-svn protocol; RA-local  interface
-  * Repos API
-  * FS API
-  * FSFS
  
- Optional components:
+ == Introduction ==
+ ...
  
-  * Client-lib diff (plain text; git format;     summary)
-  * Client-lib merge
-  * Client-lib status
-  * Client-lib: repos-repos diff
-  * WC-lib: repos-wc diff
+ == Why Not Just Copy and Delete? ==
+ I believe we need explicit move semantics, although it is difficult to 
succinctly explain why.
+ 
+ In many simple cases, a copy and a delete are adequate to represent and 
convey a move.  Clearly, Subversion has worked well enough in this way for many 
users.  However, for many others the current level of support for moves simply 
does not work.
+ 
+ The big question is: do we definitely need to represent moves as a distinct 
operation, or should we continue to represent moves as copy and delete but make 
the interpretation of these cleverer?  Is the deficiency in the semantics, or 
is the deficiency something we can rectify by making the merge algorithm and 
the conflict resolution smarter?
+ 
+ When is copy-and-delete not adequate?  It mainly shows up in merging.  We 
want to merge all the changes destined for node 'foo' in the source branch into 
the corresponding node in the target branch, even though the corresponding node 
has been renamed to 'bar'.  We need to be able to calculate a tree difference 
between two or three trees, in which each node in one tree is matched against 
the corresponding node in another tree, and the correspondence needs to follow 
renames.
+ 
+ How should the semantics of “move” differ from those of copy and delete?  
After all, it would not matter whether we distinguish a move from a copy and 
delete internally if we end up applying the same behaviour for both.
+ 
+ …?
+ 
+ Some previous ideas about how much we need to track moves explicitly:
+ 
+  * We need to track moves in the server so we         can do forward history 
tracing.  But does this really help unless       the clients are aware and able 
to communicate these moves to and        from the server?
+  * Track moves in the WC only – as implemented        in Subversion 1.8.  
This helps with certain situations: it can apply    incoming edits into a 
locally moved node, and it can prevent the        accidental committing of just 
one half of a move.
+  * We don't need to track moves explicitly, as        we can do everything we 
really need by recognizing copy-and-delete      as a move, and that has the 
advantage of not changing the network       protocols and so on.
+  * We don't need to track moves explicitly, as        we can do  everything 
we really need by always treating a copy in       the same way as the copy half 
of a move.  So, when merging, if there    is a copy of the node being merged, 
then all changes destined for       the copy-source node should go to the 
copy-dest node as well – or       instead, if the copy-source node is deleted.  
In this way, the  semantics of copy and move are unified.
+ 
+ Arguments against treating any copy in the same way as the copy half of a 
move:
+ 
+  * Why should we treat a single copy (cp A A2;        rm A) differently from 
the same situation plus an additional copy       (cp A A2; cp A A3; rm A)?  And 
if we decided to merge into all the      copies alike, then why should we only 
do so when there is a delete?
+ 
+  * ...
+ 
+ === Combining Changes ===
+ The problems with copy-and-delete boil down to various kinds of ambiguity, 
inconsistency or non-determinism.  Many of these are related to the problem of 
representing a sequence of changes as a single change.  It is fundamental in a 
version control system to be able to update, merge or diff between two widely 
separated revisions without having to step through all the intermediate 
revisions in sequence, and so it is necessary to have an unambiguous way of 
combining successive changes.  If we attempt to interpret copy-and-delete as a 
move, that leads to ambiguous or context-dependent results when combining 
changes.
+ 
+  * Spatial ambiguity.  When looking at a subtree      that contains only one 
half of the move, we would see a copy or a       delete, but if we then look at 
a wider subtree we would see a move.     Look wider again, and we may see a 
second copy from the same source,    which means there is no move because there 
is no unique copy.
+  * Ambiguity when the copy-from is not the    revision immediately before the 
copy-and-delete.  If the deleted        node was modified between the copy-from 
revision and the delete,        then is it still a move?  No, because it has a 
forked history.  If      we treat it as a move only if the delete side was not 
modified since    the copy-from revision, then there is a race on commit 
because the      change that gets committed will be seen as a move if nobody 
else        modifies it in the meantime, or as a non-move if somebody gets in   
    first.  If we want the semantics of a move, we have to tell the         
server it is a move so it can avoid this.
+  * Ambiguity when the delete is not in the same       revision as the copy.
+  * Temporal ambiguity.  Difficulty in composing       a series of changes 
(revisions) together.  If we start with (cp A       A2; rm A), that looks like 
a move, but if we then commit (cp    A@orig-rev A3) and look at the overall 
combined change, we now see a    multiple-copies scenario.  Conversely, if we 
start with (cp A A2; cp    A A3; rm A) and then commit (rm A2), we change a 
non-move into a        move.
+ 
+ In one context, a certain copy and delete can be paired uniquely and thus 
interpreted as a move, while in another context the same copy and delete are 
not unique or are not both visible.
+ 
+ === Move vs. Rename ===
+ We say “move” or “rename” interchangeably for most purposes.  Their essential 
similarities include the concept of a preserved node identity.  It can be 
useful sometimes to draw a distinction.  When merging a rename-only (A/foo → 
A/bar) with a move-only (A/foo → B/foo) we can suggest that the most likely 
merge resolution would be to apply both the move and the rename (→ B/bar).
+ 
+ == H2 { margin-bottom: 0.21cm; }H2.western {  }H2.ctl { font-family: 
"FreeSans"; }P { margin-bottom: 0.21cm; }P.western {  }A:link {  } ==
+ == System Overview ==
+ Move support can be added in phases.  The “core components”, outlined in 
yellow in the following diagrams. must be upgraded to get a basic level of 
support in which commits and updates support moves.  The other components, 
including merge, can be supported later.
+ 
+ Client side:
+ 
+ 
+ 
+ Server side:
+ 
+ 
  
  == Core Components ==
  === Client ↔ RA ↔ Repos ===
  The only Client → Repos op affected is Commit.  (There are also simple commit 
actions, of which 'move URL URL' is probably the only relevant one.)  Commit 
will use a move-aware delta-editor.  We already have local moves in the WC, so 
we just need to describe those to the editor as moves.
  
- '''gstein sez''': RA_local's Ev2 interface (see `svn_ra__get_commit_ev2`) 
will transfer a move from the RA editor all the way to the FS API. See 
`libsvn_fs/editor.c:move_cb()`. When the FS grows an API, then RA_local will 
drive it.
+ '''gstein sez''': RA_local's Ev2 interface (see {{{svn_ra__get_commit_ev2}}}) 
will transfer a move from the RA editor all the way to the FS API. See 
{{{libsvn_fs/editor.c:move_cb()}}}. When the FS grows an API, then RA_local 
will drive it.
  
  The Repos → Client ops affected are Diff, Update, Switch, Status.  These all 
work in a very similar way.  Each one sends a Report and receives a delta-edit. 
 They will use a move-aware delta-editor.
  
@@ -37, +78 @@

  
  When not using a move-aware editor, WC describes each move from the WC DB as 
copy & delete in the old way.
  
- '''gstein sez''': why would you ever have a non-move-aware editor? See the 
`ev2-export` branch for a commit process that drives an Ev2 editor (meaning: a 
move-aware editor)
+ '''gstein sez''': why would you ever have a non-move-aware editor? See the 
{{{ev2-export}}} branch for a commit process that drives an Ev2 editor 
(meaning: a move-aware editor)
  
  ==== Update/Switch/Status/Diff ====
  WC receives each move from the move-aware editor.
@@ -48, +89 @@

  
  When not being driven by a move-aware editor:
  
-  * Insert the heuristic move-detector, if       desired.
+  * Insert the heuristic move-detector, if     desired.
   * Otherwise, only copies and deletes are seen.
-  * Lose any move heuristics currently built in  to copy & delete. I think 
this only affects the conflict        resolution.
+  * Lose any move heuristics currently built in        to copy & delete. I 
think this only affects the conflict        resolution.
  
  === Repos ↔ FS ↔ FSFS ===
  TODO...
  
- '''gstein sez''': Repos has an Ev2 commit editor, which `ra_local` can use. 
This drives the FS Ev2 commit editor. Currently, the moves are transformed into 
copy/delete, but that can be fixed "trivially" by adding an FS API (which does 
copy/delete under the covers) and converting `fs/editor.c:move_cb()` over to 
using it. Then the move issue is completely within FS.
+ '''gstein sez''': Repos has an Ev2 commit editor, which {{{ra_local}}} can 
use. This drives the FS Ev2 commit editor. Currently, the moves are transformed 
into copy/delete, but that can be fixed "trivially" by adding an FS API (which 
does copy/delete under the covers) and converting {{{fs/editor.c:move_cb()}}} 
over to using it. Then the move issue is completely within FS.
  
  === Within FSFS ===
  Introduce 'move' as a new, distinct operation.  Add move-tracking APIs.
@@ -63, +104 @@

  ==== In existing FSFS (format 6) ====
  Alter the node id and copy-id assignment rules.
  
-  * A moved node gets the same copy-id as its    copy-from node.
+  * A moved node gets the same copy-id as its  copy-from node.
-  * All children of a moved node get the same new        copy-id as their 
parent.
+  * All children of a moved node get the same new      copy-id as their parent.
  
  Adjust implementation of existing APIs to see those moves as copies (for 
back-compat).
  
@@ -76, +117 @@

  ==== New FS APIs ====
  Provide new APIs that see moves as moves:
  
-  * Find “the same” node in another revision.    This query can be shaped in 
various ways, such as:
+  * Find “the same” node in another revision.  This query can be shaped in 
various ways, such as:
+ 
-   * For a given set of nodes in revision X, find                where the 
“same” nodes exist in revision Y.
+   * For a given set of nodes in revision X, find              where the 
“same” nodes exist in revision Y.
-   * Compare directories PATH1@REV2 and          PATH2@REV2, and return a list 
of matching name-pairs between them.
+   * Compare directories PATH1@REV2 and                PATH2@REV2, and return 
a list of matching name-pairs between them.
  
  TODO...
  
@@ -86, +128 @@

  Needs no changes.  (The reporter does not report changes, it just reports a 
base state.)
  
  === Delta-Editor ===
- Moves will be transmitted over the old svn_delta_editor_t.  A move-aware 
producer will drive the existing editor interface in a way that is (more or 
less) backward compatible with existing consumers.
+ Options:
  
-  * What about Ev2?  Supposed to have support for        moves.  Untested and 
unknown.  Seems better to start by adding  support to a well known editor 
first, if that is possible (which it     seems to be).  Then see if there are 
any functionality or efficiency    issues that could be improved by use of Ev2 
(or something like it).
+  * Transmit moves transparently over the old  svn_delta_editor_t.  A 
move-aware producer will drive the existing      editor interface in a way that 
is (more or less) backward compatible    with existing consumers.
  
+  * Use Ev2.  Supposed to have support for moves.       Untested and unknown.
+ 
+ It may be better to start by adding support to a well known editor first, if 
that is possible (which it seems to be).  Then see if there are any 
functionality or efficiency issues that could be improved by use of Ev2 (or 
something like it).
+ 
- '''gstein sez''': It '''does''' have support for moves. Not "supposed to". It 
is better tested/known then any hack you may want to add into 
`svn_delta_editor_t`. Note that the delta editor structure is bare to the 
world. You cannot really extend that vtable. The Ev2 editor solves that 
problem. To put it blunkly/frankly, you're talking about reinventing the wheel 
that has already been done in Ev2. It is nothing short of ridiculous to start 
over again. Not to mention a ton of completed work on converting pieces of the 
codebase over to Ev2. I would turn it around: show why Ev2 isn't sufficient, 
rather than starting over again from the delta editor. We also have working 
shims for converting between delta_editor and Ev2, to help with the transition. 
This pseudo-move-delta has zero support, zero testing, zero review. The Ev2 
design was crafted with the understanding of problems with the delta_editor 
interface. It solves them, and moves the codebase away from that crap. Piling 
more onto delta_editor makes it worse.
+ '''gstein sez''': It '''does''' have support for moves. Not "supposed to". It 
is better tested/known then any hack you may want to add into 
{{{svn_delta_editor_t}}}. Note that the delta editor structure is bare to the 
world. You cannot really extend that vtable. The Ev2 editor solves that 
problem. To put it blunkly/frankly, you're talking about reinventing the wheel 
that has already been done in Ev2. It is nothing short of ridiculous to start 
over again. Not to mention a ton of completed work on converting pieces of the 
codebase over to Ev2. I would turn it around: show why Ev2 isn't sufficient, 
rather than starting over again from the delta editor. We also have working 
shims for converting between delta_editor and Ev2, to help with the transition. 
This pseudo-move-delta has zero support, zero testing, zero review. The Ev2 
design was crafted with the understanding of problems with the delta_editor 
interface. It solves them, and moves the codebase away from that crap. Piling 
more onto delta_editor makes it worse.
  
  '''brane notes''': We're kind of aware of all that. The problem right now is 
that the only two people who actually know anything about EV2 aren't very 
active on the project. I'm hoping this will change (hint hint!), in which case 
I don't think we'd be considering hacking the delta editor at all.
  
- ==== Adding a 'move' operation (and negotiate the use of it) ====
+ ==== Add a 'move' operation and negotiate the use of it ====
  Introduce a 'move' method in the vtable.
  
- Feature negotiation (when driving or receiving an edit over the network to a 
potentially older client or server), for
+ Use feature negotiation, when driving or receiving an edit over the network 
to a potentially older client or server), to decide whether the 'move' 
operation is allowed.  When driving an edit, we must send copy & delete instead 
of move.  When receiving an edit, we cannot always assume copy & delete means 
move because, when a move-aware edit driver sends or receives copy & delete, it 
does __not__ mean move.
  
-  * whether the 'move' operation is allowed
- 
- (We cannot just assume copy & delete means move because, when a move-aware 
subsytem sends or receives copy & delete, it does __not__ mean move.)
- 
- ==== Entry-props method (backward-compatible) ====
+ ==== Entry-props method for Ev1 (backward-compatible) ====
  Augment the existing 'delete' and 'copy' ops with move info that is ignored 
by old consumers.  Src path and Dst path of a move are transmitted in 
entry-props before (or during) the Del and Add.
  
  A move-aware consumer will process a pair of 'add' and 'delete' ops as a move 
if the additional move info is present, or in the old way if not.  Thus the 
scheme is backward compatible in both directions.
@@ -111, +153 @@

  The entry-props can be attached to any convenient path.  A convenient choice 
is the parent dirs of the two operative paths.
  
  ==== Copy-from-rev = -1 method (not compatible) ====
- We can define a particular consumer will perform a move when it receives
+ For certain interfaces, including certain instances of Ev1, we can define a 
particular consumer will perform a move when it receives
  
  Add(copy-from-path = X, copy-from-rev = -1)
  
  which means “move from X to here”.
  
- The consumer has to handle the Del in such a way that it can later be 
converted to a move if and when a move arrives.
+ The consumer has to handle each Delete in such a way that it can later be 
converted to a move if and when a move arrives.
  
  This is not backward-compatible with existing editor transports nor with 
existing consumers.  Consumers obviously would fail, and editor transports 
often assert that the copy-from-rev is valid when copy-from-path is valid.
  
@@ -128, +170 @@

  
  In a sane editor drive [1], once a path has been added it is not subsequently 
moved, so any Add path corresponds directly to a path in the final state of the 
tree.  However, after a Del (mv-away), any of the implicitly deleted children 
of that subtree may subsequently be the source of a move.
  
+ == Other Components ==
+ === Merge ===
+ Merge is too big a topic to discuss here.
+ 
+ === Diff ===
+ Enhance the diff output formats to show moves:
+ 
+  * plain diff
+  * diff --git
+  * diff –summarize
+ 
+ == Backward Compatibility ==
+ We can and will preserve backward compatibility between move-aware clients 
and move-unaware repositories, and between move-aware repositories and 
move-unaware clients.  There are two complementary parts to this:
+ 
+  * when sending a move to an old client or    server, we shall convert to 
copy + delete;
+  * when receiving copy + delete from an old   client or server, we could 
heuristically convert some cases to a        move.
+ 
+ === Heuristic Detection of Moves ===
+ The server could perform heuristic detection of moves when an old client is 
committing.
+ 
+ The client could perform heuristic detection of moves when an old server is 
sending an update (or diff or merge etc.).
+ 
+ We could offer to perform heuristic move detection when upgrading an old 
repository, almost certainly as an off-line operation.  We could potentially 
implement this in any of: svndumpfilter, svnsync, svnadmin load, svnadmin 
upgrade, etc.
+ 
+ === Repo Format Bump ===
+ It is essential that the repository filesystem 'knows' whether move semantics 
are enabled, because copy-and-delete must then no longer be interpreted 
heuristically as a move.  This could be indicated by bumping the FS format 
number, if it applies to the whole repository, or potentially we could mark 
that all revisions after a certain point have move semantics enabled whereas 
prior revisions don't.
+

[Subversion Wiki] Update of "MoveDev/MoveDev" by JulianFoad

Reply via email to