Re: [Etoile-dev] Object graph Diff & Merging

Eric Wasylishen Mon, 02 Aug 2010 15:58:09 -0700

Hey Quentin,

>> Doing a object graph diff and serializing that
>> could be error prone - you have to make sure the object graph diff
>> serialization/_ and creating/applying is 100% robust.
> 
> I see. I thought that was an implicit property of the diff algorithm/model 
> you use.
> There are probably some border cases to be examined and tested carefully, but 
> that seems achievable.
> 
You're right, it's not really a problem. On second thought, the git approach 
which I mentioned later might be silly because it involves throwing away the 
already computed high-level diff and recomputing the diff at a lower level.


>> It also increases the coupling between the low-level storage and
>> higher level parts of CO.
> 
> Doesn't sound much worse than the one we have currently between 
> EtoileSerialize and CoreObject.
> 
>> Another approach is to be like git and just do a separate binary diff
>> on the serialized snapshot data. I like this because it makes storage
>> conceptually simpler - it's just storing whole snapshots of objects
>> but happens to delta compress related ones as an implementation detail.
> How would that work with media documents?
> Suppose you work on a image that weights 300 MB and several commits per 
> minute have to be done. User changes to record might even happen at very 
> short interval (one second or two).

In my opinion, regardless of how CO is implemented, for photo/video editing 
what we have to keep persistent and versioned is the tree of drawing 
operations/filters that we discussed a bit already, rather than keeping the 
resulting bitmap data persistent and versioned.

Two reasons:
- you need the tree structure to do merge/selective undo, 
- and saving every snapshot would obviously eat disk space too fast (even with 
multi-terrabyte drives) :-)

It might make sense to cache the bitmap data, but since it can be regenerated 
given the tree of drawing operations/filters, it probably doesn't make sense to 
keep old versions of the bitmap data given their potential size.

>> Yeah. I'm not sure if doing a commit after every change would lead to
>> too many history graph nodes or not - my idea was to keep the
>> granularity of -commits to about the same as the granularity of
>> undo in normal Mac applications. Ideally, the commits each have
>> some meaning to the user. (I wrote some ideas in COHistoryGraphNode.h
>> about attaching metadata to a commit to mark it as a "major checkpoint",
>> i.e. what happens when you click our replacement for the save button)
> 
> Yes, that's better to have meaningful commits.
> 
> Here is my take on that…
> For simple model objects (photo, music etc.), setter or mutation methods 
> usually correspond to a user-visible change. So CoreObject as it is work well 
> here.
> For compound document and content editing (image data, text), the mapping is 
> more complex or inexistent. So user-visible change must be recorded at a 
> higher-level.
> Action handlers in EtoileUI provides an API that express these user-visible 
> changes. In fact, that was roughly the Taligent solution to record the user 
> actions in a persistent way.
> Just to give another example… For CodeMonkey, the IDE class provides methods 
> that corresponds to user actions, and is wrapped with a COProxy to record the 
> changes.
> 
Right.

> That's not truly related, but ideally I'd like to have several "undo tracks". 
> For example, multiple tracks would be:
> - the document or object I'm working on
> - the app-level or work context
> - the library the object belongs to
> - the overall UI (would record almost all other UI actions)
> The last track would let me undo a window close or move. I'm not sure this 
> last track is a realistic idea… Undoing a shutdown is hard ;-) Well various 
> cases would be hard to undo or even record I think.
> 
> Presently this track notion is only partially related to object contexts in 
> CoreObject, that's why I'm planning to rework COObjectContext into something 
> closer to that.

I agree; we'll really need the undo tracks feature.

One way I could see implementing this in my ObjectMerging project is by 
attaching metadata to the COHistoryGraphNode for each commit, like this:
{document-uuid: XXX
 app-uuid: YYY
 library-uuid: ZZZ,
  .... (maybe other tracks) .... }

Then, supposing you want to do an undo/redo action for a particular document, 
you first filtering the overall history graph to get only the nodes with the 
correct document-uuid tag. The filtered history graph is then used to figure 
out which changes to undo at each step.

Since the nodes in the filtered history graph likely won't be adjacent in the 
overall history graph, undoing them will involve selective undo, which means 
merge conflicts could occur- but I think this is okay and probably unavoidable 
when you have multiple undo tracks.
 
btw, what do you think of my idea of modeling the history graph using the 
COHistoryGraphNode class?

>> So for typing a document, doing a commit after every key press would
>> probably be excessive, but after 5 words or something (IMHO the
>> undo/redo granularity you get in OS X when typing is just right.)
> 
> I don't know. CO consumes disk space very quickly in my tests, so recording 
> every key press sounds like a bad idea. A compression scheme could solve the 
> problem or not.
> If the recording granularity is bigger, we should still try to lose no key 
> press :-)

Good point - ideally you won't lose even one key press.

> For this kind of use case, we could have something like a temporary track/log 
> (at a global level) or a temporary branch (at the persistent root level) to 
> save every key press (or similar serialized actions that quickly accumulate). 
> The CoreObject garbage collector would then collect this temporary data when 
> it has been superceded a recording at a higher granularity.
> 
Something like that sounds good. We will probably want the ability to 
permanently delete parts of the history - e.g. delete all history before a 
certain date, or delete history between checkpoints before a certain date.

> 
>>> You also say selective undo support is planned but you don't explain
>>> how… ?
>> 
>> I think you can implement it pretty easily as a merge. Here's my current
>> idea:
>> 
>> Suppose these are nodes in a history graph, and the current revision is E.
>> 
>> A---B---C---D---E
>> 
>> The user wants to undo the changes made in revision B.
>> 
>> What we could do is create a branch of B in which all edits made in B are
>> undone; i.e. it's the same state as in history graph node A, so call it A'.
>> 
>> A---B---C---D---E
>>      \_A'
>> 
>> Then just merge E and A' - I think this will be the same as a selective undo.
>> You'll get merge conflicts if there were any changes in C, D, or E to which
>> overlap with the changes being undone in B, but this is exactly what you
>> want.  I haven't tried it yet though - could be that I'm missing some detail
>> and this is nonsense :-)
> 
> Sounds like an interesting approach.
> If C, D or E don't rely on the B state in any special way, this should work.
> I mean, if B involves a state change expected by C, D and E… This state 
> change must result into an overlap conflict, otherwise things could break.

Right, the merge algorithm should correctly flag that as a conflict.

> btw have you taken a look at GINA which is mentioned in the Flexible Object 
> Merging Framework paper? From what I read, it uses a command log very similar 
> to CoreObject, and seems to support merging several message/command 
> histories, this sounded very similar to what CoreObject intends to do.

I had a look at the GINA paper (T Berlage, A Genau. "A framework for shared 
applications with a replicated architecture").  They are using the Command 
Pattern, so you have to write a class for each operation which can modify 
document state. The command classes have methods like selectiveUndo, 
selectiveRedo, canSelectiveUndo, canSelectiveRedo. To merge two lists of 
commands, they don't do any  transformations on them; they just concatenate the 
lists of commands.

I also re-read the selective undo paper I mentioned ("A Framework for Undoing 
Actions in Collaborative Systems'' by Prakash and Knister - link: 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.4793&rep=rep1&type=pdf
 ) Everyone interested in selective undo/merging should check this out, I think 
it's a really good paper :-).

They describe a general theory of how you can selectively undo/redo/merge 
operations. You need to define three functions:

inverse(op) -> op'   -- such that op' does the opposite of op.
transpose(op1, op2) -> (op2', op1') --- such that applying op1 followed by op2 
has the same effect as applying op2' followed by op1'. This is the central 
opertaion in merging.
conflicts?(op1, op2) -> bool   --- true if op1 and op2 conflict. (as defined in 
the paper.)

Looking at GINA from this viewpoint, the programmer writing the command classes 
has to define inverse() and conflicts(), but since you can't specify a 
transpose function in GINA, your command objects have to be able to be 
reordered without modifying them.  This makes it tricky or impossible to write 
commands which modify arrays, because you can't store array indices in your 
command objects since they will become invalid if the array is changed before 
the command is executed. So I'm not sure if GINA really offers any interesting 
solutions.



To return to CoreObject's implementation, I'm concerned that merge/selective 
undo isn't  possible with pure message-based persistency.  In order to support 
merging/selective undo, we need to be able to tell if operations conflict, get 
their inverse, and transpose groups of them. It's easy to define these for a 
small set of basic operations like setValue:forProperty:, 
insertObject:atIndex:ofProperty:, removeObject:atIndex:ofProperty:, etc., which 
is more or less what I did in ObjectMerging. 

But If the message log contains high level messages like "-refactorMethod:to:" 
or "-indentParagraph:" without any other information, you can't really do 
selective undo/merge. The only practical way I see of defining the 
inverse/transpose/conflicts functions on these high level operations is to 
record the high-level operations  as a bunch of the primitive ones for which we 
already have inverse/transpose/conflicts defined. 

Then your message log looks something like this:

{object: UUID1 recordMessage: -[setValue: 'abc' forProperty: 'bar']},
{object: UUID1 recordMessage: -[setValue: 'def' forProperty: 'bar']},
{object: UUID2 recordHighLevelMessage: -[refactorMethod: 'foo' to: 'bar']  
definition: (
    {object: UUID2 message: -[setValue: 'bar' forProperty: 'name']},
    {object: UUID3 message: -[setValue: 'bar' forProperty: 'name']}
)}

Now, this is really close to what I'm doing in ObjectMerging, except the 
groupings of low-level changes to record are indicated by doing a 'commit'. But 
it also no longer really looks like message-based persistency, because you're 
recording the state change..

What do you think?

Anyway, I hope this didn't get too long and rambling. :-) Maybe we should have 
a skype meeting sometime to discuss this?

Cheers,
Eric

_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] Object graph Diff & Merging

Reply via email to