Re: [Etoile-dev] Object graph Diff & Merging

Eric Wasylishen Thu, 22 Jul 2010 12:56:35 -0700

Hey Quentin,

On Jul 22, 2010, at 8:39 AM, Quentin Mathé wrote:


> Hey Eric,
> 
> Le 20 juil. 2010 à 22:32, Eric Wasylishen a écrit :
> 
>> I finally committed some code I've been working on for a while at 
>> http://svn.gna.org/svn/etoile/branches/ericwa/ObjectMerging/
>> 
>> The README gives a little introduction. It's basically my rethinking  
>> of how to implement CoreObject, along with some research I did on  
>> how to do object graph diff / merge with a CoreObject style object  
>> graph. Because our objects are labelled with UUID's, it turns out to  
>> be really easy (compared with a general XML diff/merge, which  
>> involves a lot of guesswork). I'm not doing anything novel, but I  
>> think the results will be very nice (with a simple implementation,  
>> too.)
>> 
>> I'd be glad to hear what you guys think.
> 
> I took a quick look at the code yesterday and I started to read the  
> papers mentioned in the README. I'm not sure this reimplementation is  
> what we want but I really like the result.

I'm glad to hear you like it! :-)

> Using a delta compression  
> based on model edit operations to record the history is really neat  
> btw :-)

I'll just clarify that my code currently doesn't do any delta compression;
it's like unpacked git (when you commit, it saves a snapshot of all modified
objects). I will add delta compression of some kind, but I'm not yet sure
the best way to do it. 

Doing a object graph diff and serializing that
could be error prone - you have to make sure the object graph diff
serialization/unserialization and creating/applying is 100% robust.
It also increases the coupling between the low-level storage and
higher level parts of CO.

Another approach is to be like git and just do a separate binary diff
on the serialized snapshot data. I like this because it makes storage
conceptually simpler - it's just storing whole snapshots of objects
but happens to delta compress related ones as an implementation detail.

> Once I have put some thoughts into the merging support, I'll post a  
> real reply. Probably in the coming days.
> 
Cool :-)

> At first sight, the existing CoreObject and the reimplementation might  
> be closer than I thought initially. I mean operations are implemented  
> as classes in the reimplementation and as methods in the existing  
> CoreObject in a kinda dual way. If CoreObject as its stand currently  
> was restricted to COObject/COGroup strictly, they would be almost  
> identical since everything would revolve around operations such insert/ 
> remove/update (-addObject:, -removeObject: and - 
> setValue:forProperty:). COObject in the reimplementation could even  
> support transparent persistency by wrapping it with proxy that  
> automatically calls -commit for the methods mentioned just before.

Yeah. I'm not sure if doing a commit after every change would lead to
too many history graph nodes or not - my idea was to keep the
granularity of -commits to about the same as the granularity of
undo in normal Mac applications. Ideally, the commits each have
some meaning to the user. (I wrote some ideas in COHistoryGraphNode.h
about attaching metadata to a commit to mark it as a "major checkpoint",
i.e. what happens when you click our replacement for the save button)

So for typing a document, doing a commit after every key press would 
probably be excessive, but after 5 words or something (IMHO the 
undo/redo granularity you get in OS X when typing is just right.)

Another example, moving a group of objects from one place to
another should probably be one commit. I imagine this will be
really common; it will happen whenever things are picked/dropped, etc.

> A problem I would worry about in the reimplementation is its scability  
> with complex compound documents (e.g. a map edited with a vector  
> drawing app). For technical graphics documents, a tree structure that  
> contains 10 000 nodes and several objects as properties per node  
> doesn't seem unexpected. Computing the entire object graph diff in a  
> less than 100 ms sounds impossible. How would you handle such a case?

I think the scalability should be pretty good. It should scale roughly the same
as a git/mercurial repository with 10000 files. I haven't done any tests
yet, though.

I'll just quickly explain what the code currently does:

- when you set a property/value of an object, the object's context
records the object's UUID in a set, marking it as 'changed'. (just like
git/svn; when you change files in a working copy, the system has a 
fast way of checking which files were modified; I think they check
modification time of all tracked files.)

- when you commit, the object context just serializes the objects which
are marked as modified. (So there is never a diff of all 10000 objects
in a normal cycle of edit - commit).


If the user wants to see a diff of two versions of the document,
the code would currently do a diff of all 10000 objects, which would
likely be quite slow. However it can easily be improved to only
diff the objects that were changed between the two revisions, since
COHistoryGraphNodes records a list of UUIDS which were modified
in that commit.

Of course, there are still cases where we might have to diff
all 10000 objects. I'll need to do some benchmarking.

> You also say selective undo support is planned but you don't explain  
> how… ?

I think you can implement it pretty easily as a merge. Here's my current
idea:

Suppose these are nodes in a history graph, and the current revision is E.

A---B---C---D---E

The user wants to undo the changes made in revision B. 

What we could do is create a branch of B in which all edits made in B are 
undone; i.e. it's the same state as in history graph node A, so call it A'.
    
A---B---C---D---E
        \_A'

Then just merge E and A' - I think this will be the same as a selective undo.
You'll get merge conflicts if there were any changes in C, D, or E to which
overlap with the changes being undone in B, but this is exactly what you 
want.  I haven't tried it yet though - could be that I'm missing some detail
and this is nonsense :-)

btw, the best resources I found on selective undo are this article,
http://www.python.org/workshops/1997-10/proceedings/zukowski.html
,and the paper it references, "Undoing Actions in Collaborative Work: 
Framework and Experience'' by Prakash and Knister.

Regards,
Eric

_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] Object graph Diff & Merging

Reply via email to