Re: allowing multiple conflicts in storage

Julian Foad Wed, 31 Mar 2010 11:35:17 -0700

On Wed, 2010-03-31, Stefan Sperling wrote:
> On Wed, Mar 31, 2010 at 01:44:23PM +0100, Julian Foad wrote:
> > The Source of the Incoming Change
> > =================================
> > 
> > I suggest something like (? means an optional field):
> > 
> >   * A diff between nodes that have a location in a repository.
> > 
> >       left=(repo, rev, relpath, sha1?)
> >       right=(repo, rev, relpath, sha1?)
> > 
> >       # This source description is normally from merge/up/sw.
> >       # Full texts are (at least in principle) available.
> > 
> >   * A Patch File
> > 
> >       patch-file=(path, (info to identify which part of the
> >                          patch file applies to this node - e.g. byte
> >                          offsets or other info)?)
> >       reported-left=(repo?, rev?, relpath?)
> >       reported-right=(repo?, rev?, relpath?)
> > 
> >       # This source description is normally from operation 'patch'.
> >       # The reported-{left,right} are taken from info in the patch
> >       #   file if available: e.g. from "+++ foo (r1000)" we can fill
> >       #   in at least 'rev' and maybe 'relpath'.  These cannot be
> >       #   considered trustworthy even when given, as patch files can
> >       #   be constructed and edited at will.
> > 
> >   * A diff between nodes that don't have a repository.
> >       # e.g. a merge from a diff between two locally-modified WCs.
> >       # We have no need of this yet (not supported) but we should
> >       # expect to support this kind of source of change some time.
> 
> This sounds very interesting.
> 
> Can you try to adjust notes/wc-ng/conflict-storage accordingly?
> The notation you used is different from what is used there,
> so I'm not sure I really understand whether the above is meant
> to supplement the current design or not.
> I'm making some suggestions below but maybe you have better ideas.


OK I've attached a patch with ideas and suggestions.  (The above was
meant to be a concept that should be embodied in the design, not a
design in itself.)

> > A Change Affects the Whole Node
> > ===============================
> > 
> > It is important to record the whole-node source of the change, as far as
> > possible, even when only one property conflicts, because I want to be
> > able to see what the other properties and text were on {left, right,
> > mine} to help me resolve the conflict.  So we must not think of "a
> > property conflict" as being a fundamental thing: the fundamental thing
> > is "a change to this node conflicts with another change to this node".
> > The assumption that all properties are independent from each other and
> > independent from the text is a high-level approximation which the client
> > makes, to be more convenient in simple cases, but not a fundamental
> > truth, and we must not tie the conflict storage design to only
> > supporting this assumption.
> 
> I think the whole-node information you want is stored by the OPERATION
> skel I added today (we can still augment the OPERATION skel if you want
> more information in it).

Yup, that looks more like it.
 
> The OPERATION skel contains information as to what caused the conflict
> (the source), and the TYPE skel records details about the nature of the
> conflict. See notes/wc-ng/conflict-storage.
> 
> > In the same way, when we record a tree conflict, we want to record the
> > {left, right, mine} versions of the whole node.  We don't have the
> > infrastructure to do this yet - we don't have a pristine store for sets
> > of properties, let alone for directory trees - but we could do in
> > principle and I hope we will do in practice one day.
> 
> Recording left,right,mine trees would be nice, yes.
> 
> > The Part(s) of the Node Affected By The Change
> > ==============================================
> > 
> > Briefly: the WC probably needs to know only "this node is in conflict"
> > or perhaps tree-conflict versus content-conflict, and no more detail
> > than that.  By "know" I mean what the WC layer itself needs to
> > understand and use.
> 
> In the current model answering "is this node in conflict" means comparing
> the conflict_data column to NULL. To find out what kinds of conflicts
> there are, libsvn_wc can iterate the list of TYPE skels and look at the
> very first atom in each skel.

Great.

> > It needs to be able to *supply* much more detail,
> > of course, to the client layer, but that can (and should) be in a blob
> > whose content is opaque to libsvn_wc.  Layering is Good (unless the
> > layers are too many, too thin; but the WC layer can hardly be accused of
> > that). So I think it IS important to have a common part of the info that
> > libsvn_wc understands itself, and a detailed part that libsvn_wc simply
> > reports to libsvn_client and which libsvn_client composes and
> > decomposes.
> 
> Then maybe we should split the TYPE-specific and OPERATION skels
> into a separate list, such as:
> 
>   ((KIND (OPERATION KIND-SPECIFIC)) ((KIND (OPERATION KIND-SPECIFIC) ...)
> 
> The WC layer would only ever look at KIND, and not parse the skels any
> further than that.
> The client layer would supply and interpret data in (OPERATION KIND-SPECIFIC),
> setting the "blob" via libsvn_wc functions.

(As I said in my follow-up, I was confused between libsvn_client with
upper layer of libsvn_wc.  In my dream world, the logic of "update" is
handled in the client layer as a sibling of "merge".)

> Would this sufficiently fulfill the layering requirement?
> Probably, but there's a slight problem with this scheme.
> 
> E.g. I'd like the client layer to be able to ask about a conflict
> on a specific property. Or about the reject conflict caused by a
> specific hunk. This means that the wc layer will need to record some
> sort of identifier (e.g. a property's name) to tell several recorded
> property conflicts apart.

It's all just a question of where you draw the line above/below that API
and all the other ones you need.  I don't see how this particular
calling out of property names is special.

> So, we'd have something like this instead:
> 
>   ((KIND [ID] (OPERATION KIND-SPECIFIC))
>    (KIND [ID] (OPERATION KIND-SPECIFIC)) ...)
> 
> For types which can only be recorded once, the ID can be ommitted.
> 
> In such as scheme the client layer would use a low-level function
> interface like:
> 
> svn_wc__add_text_conflict(char *conflict_data)
> svn_wc__add_prop_conflict(char *property_name, char *conflict_data)
> svn_wc__add_tree_conflict(char *conflict_data)
> svn_wc__add_reject_conflict(char *hunk_header, char* conflict_data)
> 
> svn_wc__list_prop_conflicts(apr_array_header_t **property_names)
> svn_wc__list_reject_conflicts(apr_array_header_t **hunk_headers)
> 
> svn_wc__get_text_conflict(char *conflict_data)
> svn_wc__get_prop_conflict(char *property_name, char *conflict_data)
> svn_wc__get_tree_conflict(char *conflict_data)
> svn_wc__get_reject_conflict(char *hunk_name, char *conflict_data)
> 
> svn_wc__del_text_conflict(char *conflict_data)
> svn_wc__del_prop_conflict(char *property_name, char *conflict_data)
> svn_wc__del_tree_conflict(char *conflict_data)
> svn_wc__del_reject_conflict(char *hunk_header, char* conflict_data)
> 
> (parameter types above are just for illustration, I know much of them
> could be const, and we'll also need pools etc.)
> 
> Higher-level conflict handling functions would then live entirely
> within libsvn_client.

No comment on API ideas.

> Note though that the current libsvn_client/libsvn_wc layering is
> not really clean (e.g. update editor in libsvn_wc, merge editor
> in libsvn_client), which may impose restrictions upon how dumb
> libsvn_wc can really be about conflicts.

Yup.

> > A reject is one way to record that some text change could not be applied
> > because it did not match the text found in the target file.  That is
> > logically a text conflict, but recorded in a different way, and with
> > less information about which lines of text go where.  It is certainly a
> > conflict of text rather than of properties or of tree structure.
> > 
> > So, we could have said:
> > 
> >   There are five types of conflict *description* (text conflict with
> > full texts, text conflict from a patch, property conflict, tree
> > conflict, and obstruction).
> > 
> > But property and tree conflicts can (later) come from a patch, so:
> > 
> >   There are seven types of conflict *description* (text conflict with
> > full texts, text conflict from a patch, property conflict with full
> > sources, property conflict from a patch, tree conflict with full
> > sources, tree conflict from a patch, and obstruction).
> > 
> > But in my opinion that not good.  As I said above, it's better to
> > distinguish the source of change from the kind of change as two separate
> > dimensions.
> 
> I believe we already distinguish the source of the change from the kind.
> I regard the reject as a separate kind of conflict at the storage layer
> because the information we store about rejects is different than what we
> store about text conflicts.

I think we're just arguing terminology and categorization.  We both now
understand that both the "reject" and "text" kinds of conflict
description describe a conflict in the text of the file.

- Julian

Index: notes/wc-ng/conflict-storage
===================================================================
--- notes/wc-ng/conflict-storage	(revision 929653)
+++ notes/wc-ng/conflict-storage	(working copy)
@@ -5,18 +5,46 @@
 
 Conflict meta data is stored in the ACTUAL_NODE table, within the
 'conflict_data' column. The data in this column is a skel containing
-conflict information, or NULL (meaning no conflict is present).
+conflict information (meaning the node is in conflict, and the details
+are inside), or NULL (meaning no conflict is present).
 
-There are five types of conflicts (text conflicts, property conflicts,
-tree conflicts, reject conflicts, and obstructions). The conflict skel
-has the form:
+### JAF: What if the node is neither in WORKING nor in BASE? Currently,
+###   an entry in ACTUAL implies an entry in WORKING or BASE.
+### Bert said: "We can just allow that case in WC-NG, by enumerating
+###   over the nodes in ACTUAL in svn_wc__db_read_conflict_victims(),
+###   while svn_wc__db_read_children() doesn't look at ACTUAL."
+### JAF: Is it a good idea to special-case like that?  I don't like the
+###   sound of the words "just allow that case" as it sounds like losing
+###   sight of the concepts, and would prefer to talk in terms like
+###   "adjust the meaning of 'child' to include ...".
+
+The conflict skel has the form:
 
   ((KIND OPERATION KIND-SPECIFIC) (KIND OPERATION KIND-SPECIFIC) ...)
 
-where KIND is one of "text", "prop", "tree", "reject", or "obstructed".
+KIND indicates the kind of conflict description that follows and is one
+of:
+
+  "text" - meaning a "normal" text conflict of the whole node (which must be a
+    file), with left/right/mine full texts saved, and (unless it's
+    "binary") conflict markers in the working text;
+  "prop" - meaning a "normal" property conflict, with left/right/mine full
+    values saved;
+  "tree" - meaning a tree conflict;
+  "reject" - meaning a text conflict for a single hunk of text, with the source
+    being a patch file (rather than left/right full texts), and with a
+    "reject" file being saved (?);
+  "obstructed" - meaning ### TODO
+
 OPERATION indicates the operation which caused the conflict and is
-detailed below. KIND-SPECIFIC is specific to each KIND, and is detailed
-below.
+detailed below.
+
+KIND-SPECIFIC is specific to each KIND, and is detailed below.
+
+There are restrictions on what mixture of conflicts can meaningfully be
+recorded - e.g. there must not be two "text" nor one "text" and one
+"reject".  These restrictions are implied but not spelled out here.
+
 
 ### stsp: need conflict data format version info inside skel, too?
 ###   or do we bump the entire wc.db format number if we need to tweak
@@ -42,11 +70,21 @@
 Operation skel
 --------------
 
+Meaning:  The Operation skel indicates what kind of operation was being
+performed that resulted in a conflict, including the format and content
+(or reference to content) of the diff that was being applied to this
+node.
+
 The OPERATION skel has the following form:
 
   (NAME OPERATION-SPECIFIC)
 
-NAME is "update", "switch", "merge", or "patch".
+NAME is one of:
+
+  "update" - meaning a 3-way merge as in "svn update";
+  "switch" - meaning a 3-way merge as in "svn switch";
+  "merge" - meaning a 3(4?)-way merge as in "svn merge";
+  "patch" - meaning application of a context-diff, as in "svn patch".
 
 OPERATION-SPECIFIC is as follows:
 
@@ -125,6 +163,10 @@
 
   ("text" OPERATION LEFT_SHA1 RIGHT_SHA1 MINE_SHA1)
 
+### JAF: this skel is meant to be the one designated as KIND-SPECIFIC in
+### the master skel description above, so the "text" atom at the
+### beginning is redundant, isn't it?  Not sure if that's intended.
+
 {LEFT,RIGHT,MINE}_SHA1 are sha1 checksums of the full texts of
 the {left,right,mine} version of the file. File version's content
 can be obtained from the pristine store.
@@ -161,6 +203,9 @@
 --------------
 
 Tree conflicts exist on files or directories.
+
+### JAF: And symlinks, I presume - or, if not, why not?
+
 The following information is stored if there is a tree conflict on the node:
 
   ("tree" OPERATION

Re: allowing multiple conflicts in storage

Reply via email to