Re: [Etoile-dev] ObjectMerging: Commit Tracks

Christopher Armstrong Sat, 22 Oct 2011 05:18:17 -0700

Hi Eric

You've raised alot here so I'll try and address everything together.


On 21/10/2011, at 07:01 AM, Eric Wasylishen wrote:

>>> I'll give some more details on nested versioning..
>>> 
>>>>> From what I have understood, the main interest is to support undoing 
>>>>> arbitrary branch operations such as deletion, switch or merge cleanly and 
>>>>> in a transparent way for the user.
>>> 
>>> That is one motivation; but my main motivation is to better handle 
>>> composite documents (at least, that's what I hope it does.)
>>> 
>>> My motivating example is a composite document called "outer" containing 
>>> another document, "inner". The pair could be a paper and a figure in the 
>>> paper, or a photo library and a photo, etc.
>>> 
>>> With the ObjectMerging model you have to make a choice between
>>> 
>>> a) make the inner document a normal object inside a persistent root.. this 
>>> means you forfeit native versioning support for the inner document. You 
>>> can't easily present the user with undo/redo controls for the inner 
>>> document, without resorting to using selective undo. You could still view a 
>>> history graph of the inner document by taking the persistent root's history 
>>> graph and filtering out nodes that don't affect the inner document, and 
>>> then selectively undo changes affecting the inner document. You can't 
>>> really maintain multiple branches of the inner document. On the positive 
>>> side, when you make a change like reverting, tagging, or branching the 
>>> outer document, the inner document is also affected.
>>> 
>>> or
>>> 
>>> b) make both the inner document and the outer document persistent roots. 
>>> This gives full versioning/branching/undo/redo support to both the outer 
>>> and inner document, but the problem is, the link to the inner document from 
>>> the outer document is now a weak reference by UUID, so the outer document 
>>> no longer owns the inner one, and tagging/branching/reverting the outer 
>>> document has no effect on the inner document. If the user wants to record 
>>> the state of the outer document including the inner one, he has to write 
>>> down the version number of both the outer and inner document. 
>>> 
>>> IMHO, neither of these options is very good, although a) is closer.
>>> 
>>> With nested versioning, the inner document can be a persistent root with 
>>> full, native versioning support (branch/tag/undo/redo), but it can be 
>>> nested inside the outer document, so branch/tag/undo/redo on the outer 
>>> document also affect the inner document.
>>> 
>>> As far as I'm aware, no one has done this in an organized way before, 
>>> although you could always do it ad-hoc, like using the versioning feature 
>>> in an OpenOffice.org Writer document and storing the file inside git or 
>>> another version control system.
>> 
>> b) seems good enough to begin with. I'm not entirely convinced that 
>> branching the outer document should branch the inner document automatically, 
>> although there are use cases where that's the best choice.
> 
>> For example… For a page layout, I wouldn't want the inner documents to be 
>> branched, I just expect them to be in their more recent states. For a 
>> photomontage, getting the inner pictures/photos branched would be 
>> convenient, to ensure the overall composition look won't change in 
>> unexpected ways, no matter what I try on the inner elements (and I can 
>> manually bring them to their more recent states if I want to).
> 
> Thanks, those are really interesting examples. I agree with how you would 
> want them to work.
> 
> 
> The biggest concern I have with nested versioning is that a user can do this:
> 
> (assuming a persistent root, "inner document", is nested inside another 
> persistent root, "outer document")
> 
> - switch outer document to branch "A"
> - create a branch on the inner document called "1" and commit some work to it
> - switch outer document to branch "B"
> - create a branch on the inner document called "2" and commit some work to it
> 
> at this point, there are two separate inner documents, one exists in A and 
> has branch 1, one exists in B and has branch 2.
> 
> The problem is, the structure you have created here is really confusing and 
> complex, and it creates the possibility of wanting to do actions like, "move 
> branch 2 from the copy of the inner document in branch B to the copy of the 
> inner document in branch A"! This situation is so confusing that we may want 
> to forbid it from happening, which casts some doubt on the whole idea of 
> nested versioning.
> 
> 
> Thinking about this a bit more, the problem is that there are many 
> definitions of "copy".
> 
> For an embedded object, I can think of two
> - return an independent copy of the embedded object, with no relabelling
> - return an independent copy of the embedded object, with relabelling every 
> object to a new uuid and updating any references inside the copy which refer 
> to old uuids to new uuids.
> 
> For a persistent root, I can think of a lot more:
> - "copy history graph": return an independent copy of the entire history 
> graph (like doing "cp -r some_git_repository new_copy". This is what I was 
> thinking of doing in nested versioning -upon ever commit, all nested 
> persistent roots inside an outer persistent root would be copied in this way.)
> - "create branch": create a new branch off of the current state of the 
> persistent root, and return a link which tracks the latest version of the 
> persistent root on the new branch (interestingly, this is the only one that 
> performs a _mutable_ change to the persistent root)
> - "non-versioned copy": return a non-versioned (embedded object) copy of the 
> current version of the persistent root, without any relabelling. 
> - "non-versioned relabelled copy": return a non-versioned (embedded object) 
> copy of the current version of the persistent root, with relabelling every 
> object to a new uuid and updating any references inside the copy which refer 
> to old uuids to new uuids.
> - "link": (not really copying) return a link to the persistent root which 
> will continue to track further changes  (i.e., just the persistent root's 
> uuid)
> - "link to version": (not really copying) return a link to the current 
> version of the  persistent root which will not track further changes (i.e. a 
> string "uuid:version")
> - "new persistent root": create a new persistent root with an empty history 
> graph, and insert as its contents the "non-versioned relabelled copy"
> 
> Can you think of any others?
> 
> So, in your examples, you would use the "link" rule for putting photos in the 
> page layout document, and the "create branch" rule for the photo montage. 
> 
>> For ObjectMerging, we could use a special ref marker rather than a plain 
>> UUID to represent references between root objects. 
> 
>> A ref marker could include an optional branch UUID. Since a branch has a 
>> head, the ref marker would implicitly point to a specific root object 
>> version. If there is no optional branch UUID, the (inner document) current 
>> branch is picked. With this model, we could also automatically branch inner 
>> documents in a way similar to what you suggest above… 
>> For example, on branching the outer document, all the inner documents are 
>> branched. These "nested" branches would be private, in the sense they 
>> wouldn't be listed at the UI level among the branches of each root object 
>> (well, unless you request it). We could eventually use the outer document 
>> UUID as the inner document branch UUID, not sure it's a good idea though.
> 

I agree with this analysis. I came to the same conclusions about the need to 
create a special object reference+branch type, and the different types of 
"copying" that you outline when I was originally developing the theory for 
commit tracks earlier this year.

I think commit tracks can work :-). 

We need the concept of a "current branch" for each root object which is created 
with the root object. 

I don't envision that most users will want branches, but most users will want 
to embed root objects inside each other. For those who want to branch their 
inner document, we should make the fact that it is branched clear in the UI. I 
don't think this is a huge mental burden for users who want to use this 
feature. 

A user should be restricted to embedding an object that:
1. Is set to a branch ("link to branch")
2. Creates a new branch and is set to that branch ("create a branch" and "link 
to branch"). 
3. Fixed to a particular object revision, which is not updated.
4. Copied into the document as a new embedded object and severed from its 
original root object ("non versioned relabelled copy").

Option 1 is essentially a non-branching variant. Changes to the object outside 
of the document will change the object inside the document, but this should be 
kind of obvious. Option 2 is for more advanced users who want to branch the 
object inside this document (and possibly others). Cloning a document with 
option 1 or 2 will just copy the link to the branch, and changes to the branch 
will be reflected automatically inside the document.

Option 3 sounds boring, but could be made more complex by storing a copy of the 
original branch name the revision was taken from, and allowing the user to 
manually "update" the revision to the latest one of the selected branch. This 
is where visualising branch histories could be really helpful. It is closest to 
the OLE and other object embedding technologies which already exist.

Option 4 I don't like, as it can't be undone. It might still be useful to 
provide though.

I think that Option 3 is the best default with the "branch remembering" 
behaviour. Option 1 is what David describes as a "live link", in that the 
external object's changes are automatically propagated into the document. 
Option 2 is advanced, and Option 4 is super-advanced really and probably 
shouldn't be offered.

Thinking about it more, Option 3 could be an "extra option" over Option 1 and 
2, in that you can just link to a branch, or link to a new branch, but 
optionally set whether or not the version in the document is automatically 
updated to the latest version in that branch.

Cloning a document is interesting. We should basically do whatever they had in 
the original. If the default is the user must automatically update the embedded 
object revision, cloning is not problematic as the user can update manually in 
one document and not the other after cloning without problems.

The concept becomes complicated with branch following, because the cloned 
document will follow the branch automatically if the source did, but this needs 
to be explained somehow, as any other behaviour is inconsistent. This allows 
the scenario you describe above, but if an embedded object appears with its 
branch name in the document in the UI, I think its less of an issue. We should 
probably develop UI concepts for branching that we can use (I don't think exist 
in normal UIs, so we'll have to think of something ourselves :-)).  

> Hm, interesting. This would basically simulate the nested object being 
> "inside" the outer object, except that the inner object would share its 
> history graph with other users of the inner object.
> 
> If we go down this route we need to refine the concept of "current branch" 
> and "current version". 
> 
> In the ObjectMerging model every persistent root has a "current version" 
> which also defines the current branch. It's the main mutable state of a 
> persistent root, and is valid globally. 
> 
> In nested versioning it is the same, except for a nested persistent root, the 
> current version is only valid inside the parent persistent root, so you can 
> make a commit that changes the current version of a child nested persistent 
> root, for example.
> 
> I need to think about this a bit more...

Ok. I'm definitely interested in more of your analysis.

>> Rather than serializing a custom reference object, we could still use a 
>> plain UUID by generating a new UUID that represents a core object UUID + 
>> branch UUID pair, and a database table to store the mapping. I'm not sure 
>> it's a good idea either, because it would make the debugging less easy and 
>> much harder to interpret core objects exported as plists.
> 
> Yeah, that sounds like it could be problematic, I'd rather use a string like:
> 
> object-uuid/version-uuid  -- to refer to a specific version
> object-uuid:branch-uuid  -- to refer to the latest version on a branch

Ok. To support option 3 above, I think we should have something more like:

object-uuid[/version-uuid]:branch-uuid

where version-uuid is optional but branch-uuid is not. Not specifying the 
version means to follow the branch, while specifying a version means to hold it.

>>>>> Is it possible to expand the commit track model to support undo/redo on 
>>>>> arbitrary branch operations? It seems to me that if the commit track 
>>>>> model was expanded it would be pretty much equivalent to Eric's 
>>>>> NestedVersioning.
>>>> 
>>>> It easily supports undo/redo on changing the revision that a branch points 
>>>> to. However, it doesn't really account for undo/redo on a branch 
>>>> creation/deletion, and as far as I can see, its not easily added. However, 
>>>> I think it's less important; I would implement branch deletion as a toggle 
>>>> switch, and allow the user to purge the metadata of a commit track if they 
>>>> really wanted.
>>> 
>>> I agree with Chris; I think if you want to version the history graph 
>>> (meta-versioning?) it's best to design for that from the ground up (be able 
>>> to treat history graph as data to be versioned).
>> 
>> ok
> 
> I still think this is an important feature, so I'll try to include it even if 
> we don't go with nested versioning.

Ok.

>>>> Nested versioning (I believe) could be quite complicated to implement, but 
>>>> I'm happy to be proven wrong.
>>> 
>>> I'm still playing with it, so we'll see. :-)
>>> 
>>> My idea so far is:
>>> 
>>> - it's built on top a simple store that doesn't provide any versioning, 
>>> unlike the store API in ObjectMerging
>>> - the data model of the store is more or less the same as ObjectMerging: 
>>> objects with key : value pairs, and the objects can be organized in a 
>>> hierarchical structure (still not sure whether to do this like the way 
>>> ObjectMerging does, by marking certain keys as being 'composition' 
>>> relationships, or to have a special "children" key)
>>> - versioning is achieved by copying an object and its children in the 
>>> hierarchy. The store is optimized so that the copy will be "cheap".
>>> - to be useful, it needs to have a data structure to keep track of the 
>>> copied versions. This is the diagram Quentin mentioned earlier: 
>>> https://github.com/ericwa/NestedVersioning/blob/master/Docs/NestedVersioningDiagram.pdf?raw=true
>>> - what's important is that the data structure for keeping track of versions 
>>> of an object is itself constructed of regular objects, so the data 
>>> structure for keeping track of the versions of an object could be part of a 
>>> larger tree which is itself being versioned - that's where the nesting 
>>> comes from.
>>> 
>>> There is more brainstorming in the Docs directory on my github, but it may 
>>> not be very coherent.
>> 
>> Thanks for the extra infos. If we can keep roughly the same 
>> COEditingContext, COObject and COTrack API between ObjectMerging and 
>> NestedVersioning, migrating to a new model should be easy.
> 
> I agree. I'm not totally confident about the COEditingContext API, but I 
> haven't looked closely at it in a while.

I think this one is less important, as developers should not be interacting 
with this one in most cases.

>> This brings me to the fact, I plan to extract a common superclass COTrack 
>> for COHistoryTrack and COCommitTrack and do the same with the node classes. 
>> We could then present and manipulate either commit tracks or history tracks 
>> in the simple history browser I have being working on (not yet committed). 
> 
> Sure.
> 
>> If there is no objection I'd like to rename COCollection to COGroup and 
>> merge the related code from CoreObject. In addition, I'd like to bring the 
>> CoreObject protocol once tweaked a bit to ObjectMerging and COFile and 
>> CODirectory.
>> 
>> Once all this is done, I plan to move CoreObject to Deprecated and rename 
>> ObjectMerging to CoreObject.
>> 
>> Any comments about all that?
> 
> Personally, I want to sort out composite document support first - whether it 
> is extending ObjectMerging a bit, or building something new - because I'm not 
> really confident the current ObjectMerging model is sufficient.
> 
> I don't want to stop you from working on ObjectMerging though, since we can 
> always merge code in the future.

I have no objections to API changes.

>> As a last question, Eric, what needs to be done to get the remote 
>> collaboration support working with ObjectMerging? Writing COSynchronizer or 
>> more? Bringing some classes from the early ObjectMerging prototypes?
> 
> I haven't looked at that in quite a while, so I'm not sure.
> 
> I stopped working on it because I was running in to other problems like
> - I wasn't modelling parent/child relationships properly like we are now
> - bugs in my early COEditingContext and COObject implementations
> - The object grap diff functionality needed to be able to export & import 
> objects, and I hadn't solved all the issues with that.
> 
> I think it should wait until the rest of our CoreObject is more stable, but 
> it may be a good idea to have a look at the code from the early ObjectMerging 
> prototypes again.

Ok. 

Is the object graph diff functionality able to perform merges, so we could 
extend them to save them as a new commit and allow constant merging between 
branches?

Regards
Chris

--------
Christopher Armstrong
carmstr...@fastmail.com.au






_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] ObjectMerging: Commit Tracks

Reply via email to