Re: [Etoile-dev] Object graph Diff & Merging

Quentin Mathé Wed, 04 Aug 2010 06:38:56 -0700

Le 3 août 2010 à 00:56, Eric Wasylishen a écrit :

>>> Another approach is to be like git and just do a separate binary  
>>> diff
>>> on the serialized snapshot data. I like this because it makes  
>>> storage
>>> conceptually simpler - it's just storing whole snapshots of objects
>>> but happens to delta compress related ones as an implementation  
>>> detail.
>> How would that work with media documents?
>> Suppose you work on a image that weights 300 MB and several commits  
>> per minute have to be done. User changes to record might even  
>> happen at very short interval (one second or two).
>
> In my opinion, regardless of how CO is implemented, for photo/video  
> editing what we have to keep persistent and versioned is the tree of  
> drawing operations/filters that we discussed a bit already, rather  
> than keeping the resulting bitmap data persistent and versioned.
>
> Two reasons:
> - you need the tree structure to do merge/selective undo,
> - and saving every snapshot would obviously eat disk space too fast  
> (even with multi-terrabyte drives) :-)


Yes. My point was that additional operations are easier to support  
with message-based persistency. You add a new method while in the  
CoreObject reimplementation you have to define a new class I think.

For example… How would you express operations like 'blur' an image  
area or 'cut a range' in a movie clip with a state-based CoreObject? I  
suppose new operation subclasses would be added to express and save  
them in the history graph?

> It might make sense to cache the bitmap data, but since it can be  
> regenerated given the tree of drawing operations/filters, it  
> probably doesn't make sense to keep old versions of the bitmap data  
> given their potential size.

Agreed.

To comment a bit on the CoreObject reimplementation…

This makes me think that the need to cautious with the behavior and  
arguments of messages that trigger persistency in the existing  
CoreObject is now to shifted to the class that expresses the  
operation. The advantage of your approach is that it automatically  
reduces similar operations to a canonical operation (-removePerson:, - 
addObject, -addWhateverAndPray: are automatically reduced to - 
addObject:, -setValue:forProperty:, -removeObject:atIndex: etc.) and  
this makes merging much easier and serialization safer.

My main concerns are I'm not sure it really solves some things what we  
want solve:
- more transparent persistency (no explicit commit or database  
connection management)
- store arbitrary objects (EtoileSerialize) or integrate foreign  
object-model (COProxy)

The most problematic point would be the impossibility to add  
persistency to EtoileUI, because persistent objects must be COObject  
and store all their datas in a dictionary (no ivars).

 From your perspective iirc, to support the extra things I outline  
above introduces too much complexity. In the current state of  
EtoileSerialize and CoreObject, I fully agree.
Although message-based persistency is not the panacea it appears to be  
at first sight (e.g. it tends to favor big façade objects rather than  
fine-grained objects with a clear role, and requires to be very  
cautious with the behavior and arguments of messages that trigger  
persistency), I still think it's a good approach because it's  
operation-based rather than state-based and it also gives more  
flexibility than a single model class.

>> That's not truly related, but ideally I'd like to have several  
>> "undo tracks". For example, multiple tracks would be:
>> - the document or object I'm working on
>> - the app-level or work context
>> - the library the object belongs to
>> - the overall UI (would record almost all other UI actions)
>> The last track would let me undo a window close or move. I'm not  
>> sure this last track is a realistic idea… Undoing a shutdown is  
>> hard ;-) Well various cases would be hard to undo or even record I  
>> think.
>>
>> Presently this track notion is only partially related to object  
>> contexts in CoreObject, that's why I'm planning to rework  
>> COObjectContext into something closer to that.
>
> I agree; we'll really need the undo tracks feature.
>
> One way I could see implementing this in my ObjectMerging project is  
> by attaching metadata to the COHistoryGraphNode for each commit,  
> like this:
> {document-uuid: XXX
>  app-uuid: YYY
>  library-uuid: ZZZ,
>   .... (maybe other tracks) .... }

Yes, that should work.

> Then, supposing you want to do an undo/redo action for a particular  
> document, you first filtering the overall history graph to get only  
> the nodes with the correct document-uuid tag. The filtered history  
> graph is then used to figure out which changes to undo at each step.

Right. In fact, that's what CoreObject does already when an object  
context is restored to a past version. And this can already be  
leveraged at the core object granularity level too.

> Since the nodes in the filtered history graph likely won't be  
> adjacent in the overall history graph,

Right, but what matters to undo/redo in a single core object is  
whether they are adjacent in this core object history rather than in  
the entire core object graph history (aka overall history graph).
If the track records every message sent to a given core object and  
just consists of the combined histories of several core objects, the  
nodes would be adjacent at the persistent root granularity (exactly as  
it the case with a COObjectContext history currently).
To create non-adjcent nodes, the track would have to select which  
messages it logs based on a predicate. It sounds like an interesting  
feature, but that's not what I was thinking about.
What I was suggesting is just the possibility to have core objects  
that belongs to multiple object contexts at the same time rather than  
a single one.

For exampe, when a message that triggers persistency is sent to a core  
object, each track to which the object belongs to log the message.  
Well in reality, the track uuids would be attached to the object  
revision/message in the metadata db.

> undoing them will involve selective undo, which means merge  
> conflicts could occur-

Yes, if the recorded messages are selected based on a predicate, no  
otherwise I think.

> but I think this is okay and probably unavoidable when you have  
> multiple undo tracks.

In some advanced cases, probably yes.

>  btw, what do you think of my idea of modeling the history graph  
> using the COHistoryGraphNode class?

 From an implementation viewpoint, I don't think it's really needed,  
we could just store the same data by improving/extending the current  
history table in the metadata db. Then it's easy to query the history  
in various ways or leverage the history to run other queries related  
to the indexed content/properties.

For building a UI that lets you browse the history, a class like that  
makes the versioning model explicit is nice. But I would rather write  
it as thin layer around a query result.

>>>> You also say selective undo support is planned but you don't  
>>>> explain
>>>> how… ?
>>>
>>> I think you can implement it pretty easily as a merge. Here's my  
>>> current
>>> idea:
>>>
>>> Suppose these are nodes in a history graph, and the current  
>>> revision is E.
>>>
>>> A---B---C---D---E
>>>
>>> The user wants to undo the changes made in revision B.
>>>
>>> What we could do is create a branch of B in which all edits made  
>>> in B are
>>> undone; i.e. it's the same state as in history graph node A, so  
>>> call it A'.
>>>
>>> A---B---C---D---E
>>>     \_A'
>>>
>>> Then just merge E and A' - I think this will be the same as a  
>>> selective undo.
>>> You'll get merge conflicts if there were any changes in C, D, or E  
>>> to which
>>> overlap with the changes being undone in B, but this is exactly  
>>> what you
>>> want.  I haven't tried it yet though - could be that I'm missing  
>>> some detail
>>> and this is nonsense :-)
>>
>> Sounds like an interesting approach.
>> If C, D or E don't rely on the B state in any special way, this  
>> should work.
>> I mean, if B involves a state change expected by C, D and E… This  
>> state change must result into an overlap conflict, otherwise things  
>> could break.
>
> Right, the merge algorithm should correctly flag that as a conflict.
>
>> btw have you taken a look at GINA which is mentioned in the  
>> Flexible Object Merging Framework paper? From what I read, it uses  
>> a command log very similar to CoreObject, and seems to support  
>> merging several message/command histories, this sounded very  
>> similar to what CoreObject intends to do.
>
> I had a look at the GINA paper (T Berlage, A Genau. "A framework for  
> shared applications with a replicated architecture").  They are  
> using the Command Pattern, so you have to write a class for each  
> operation which can modify document state. The command classes have  
> methods like selectiveUndo, selectiveRedo, canSelectiveUndo,  
> canSelectiveRedo. To merge two lists of commands, they don't do any   
> transformations on them; they just concatenate the lists of commands.
>
> I also re-read the selective undo paper I mentioned ("A Framework  
> for Undoing Actions in Collaborative Systems'' by Prakash and  
> Knister - link: 
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.4793&rep=rep1&type=pdf
>  
>  ) Everyone interested in selective undo/merging should check this  
> out, I think it's a really good paper :-).
>
> They describe a general theory of how you can selectively undo/redo/ 
> merge operations. You need to define three functions:
>
> inverse(op) -> op'   -- such that op' does the opposite of op.
> transpose(op1, op2) -> (op2', op1') --- such that applying op1  
> followed by op2 has the same effect as applying op2' followed by  
> op1'. This is the central opertaion in merging.
> conflicts?(op1, op2) -> bool   --- true if op1 and op2 conflict. (as  
> defined in the paper.)

Sounds interesting. I'll take a look at that.

For the papers you mention in the CoreObject reimplementation, both  
were really good. The merge matrix is an interesting idea, I wouldn't  
present that to the user, but it could be a nice way to represent the  
merge settings at the developer-level. I also liked the possibility to  
specify the merging node granularity (e.g. for a text document: word,  
line, paragraph etc.) and the user priority per node.

> Looking at GINA from this viewpoint, the programmer writing the  
> command classes has to define inverse() and conflicts(), but since  
> you can't specify a transpose function in GINA, your command objects  
> have to be able to be reordered without modifying them.  This makes  
> it tricky or impossible to write commands which modify arrays,  
> because you can't store array indices in your command objects since  
> they will become invalid if the array is changed before the command  
> is executed. So I'm not sure if GINA really offers any interesting  
> solutions.

hm ok. Do you suggest being able to adjust the array indice per  
command based on which commands are skipped while replaying the  
history would solve the problem?

I found this paper (I haven't read it yet) that seems to adjust old  
commands to support selective undo: 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.755&rep=rep1&type=pdf

I had the impression GINA relied on fine-grained command/message which  
can give better merging results and better feedback for conflicts than  
more coarse-grained commits. But that was pure speculation I admit :-)

> To return to CoreObject's implementation, I'm concerned that merge/ 
> selective undo isn't  possible with pure message-based persistency.   
> In order to support merging/selective undo, we need to be able to  
> tell if operations conflict, get their inverse, and transpose groups  
> of them. It's easy to define these for a small set of basic  
> operations like setValue:forProperty:,  
> insertObject:atIndex:ofProperty:, removeObject:atIndex:ofProperty:,  
> etc., which is more or less what I did in ObjectMerging.
>
> But If the message log contains high level messages like "- 
> refactorMethod:to:" or "-indentParagraph:" without any other  
> information, you can't really do selective undo/merge. The only  
> practical way I see of defining the inverse/transpose/conflicts  
> functions on these high level operations is to record the high-level  
> operations  as a bunch of the primitive ones for which we already  
> have inverse/transpose/conflicts defined.
>
> Then your message log looks something like this:
>
> {object: UUID1 recordMessage: -[setValue: 'abc' forProperty: 'bar']},
> {object: UUID1 recordMessage: -[setValue: 'def' forProperty: 'bar']},
> {object: UUID2 recordHighLevelMessage: -[refactorMethod: 'foo' to:  
> 'bar']  definition: (
>     {object: UUID2 message: -[setValue: 'bar' forProperty: 'name']},
>     {object: UUID3 message: -[setValue: 'bar' forProperty: 'name']}
> )}

Sounds good.

Excep that UUID3 must not be a core object, but just a path inside the  
core object UUID2, otherwise you get a side-effect that prevents the  
deterministic replay.

> Now, this is really close to what I'm doing in ObjectMerging, except  
> the groupings of low-level changes to record are indicated by doing  
> a 'commit'. But it also no longer really looks like message-based  
> persistency, because you're recording the state change..

I'm not sure to get your point. For a method or operation, I would say  
the arguments encodes how the state will change. You don't record the  
object state but just some additional metadatas/messages which could  
even be represented as arguments I think. In the end, it is still  
operation/message-based. For example…

objectUUID2 refactorMethod: foo' to: 'bar'
{
        // Here -record would ask CoreObject to append the invocations to the  
serialized -refactorMethod:to: invocation
        // Alternatively we could handle that in a more implicit way by  
recording every basic persistency messages invoked until the method  
returns
        [[objectUUID2 record] setValue: 'bar' forProperty: 'name'];y
        // With objectUUID3 which is not a core object but a uniquely  
identified object inside the object graph owned by the core object  
(UUID2)
        [[objectUUID3 record] setValue: 'bar' forProperty: 'name'];
}

can be rewritten as below:

objectUUID2 refactorMethod: 'foo' to: 'bar' setValue: bar forProperty:  
on: objectUUID2 setValue: bar forProperty: name on: objectUUID3
{
        [objectUUID2 setValue: bar forProperty: name];
        [objectUUID3 setValue: bar forProperty: name];
}

This new method would be the one that triggers persistency instead of - 
refactorMethod:to: that would just call it. This way you don't have to  
record the intermediate -setValue:forProperty:, they get encoded in  
the recorded message itself.

Does that make sense or am I completely off?

> What do you think?

I have to write some use cases on the paper and think about them :-) I  
probably need to read some extra papers on the topic too.
It's a really tricky problem and how to integrate that cleanly without  
too much complexity in the entire stack from EtoileSerialized to  
EtoileUI hurts my brain ;-)

I found some other papers that could potential interest us:

- A document mark based on method supporting group undo
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.8003
- The Multi-version and Single-display Strategy in Undo Scheme (looks  
like an updated version of the previous one)
http://jmyang.info/papers/cit_2005_undo.pdf
http://jmyang.info/slides/cit_2005_undo.ppt (slides)
- Consistency Maintenance Based on the Mark & Retrace Technique in  
Groupware Systems (yet another more updated version)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.9726
http://jmyang.info/slides/group_2005_markretrace.ppt (slides)

- Undo Any Operation at Any Time in Group Editors
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.6266
- Undoing Any Operation in Collaborative Graphics Editing Systems
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5993

- A Flexible Undo Framework for Collaborative Editing
http://hal.inria.fr/index.php?halsid=f54l7h2e5149op3i00vq8daid5&view_this_doc=inria-00275754&version=2

- A flexible multi-mode undo mechanism for a collaborative modeling  
environment
http://portal.acm.org/citation.cfm?id=1813978
- A Temporal Model for Multi-Level Undo and Redo
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.8107

- A Selective Undo Mechanism for Graphical User Interfaces Based On  
Command Objects (the one I mentioned earlier)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.755
- Reusable Hierarchical Commands (sounds similar to what you suggest  
to support selective undo with message-based persistency)
http://portal.acm.org/citation.cfm?id=238386.238526&type=series

- Object-based nonlinear undo model
http://www.computer.org/portal/web/csdl/doi/10.1109/CMPSAC.1997.624739

More papers about collaborative editing on tree-structured documents:

- Operation-based versus State-based Merging in Asynchronous Graphical  
Collaborative Editing
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCEW04.pdf
- Multi-level Editing of Hierarchical Documents
http://www.springerlink.com/content/472w061011830726/
- Tree-based model algorithm for maintaining consistency in real-time  
collaborative editing systems
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCEW02.pdf
- Draw-Together: Graphical Editor for Collaborative Drawing
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCSCW06.pdf
- Maintaining Consistency in Collaboration over Hierarchical Documents  
(the thesis that relates to these previous papers)
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatPhDThesis06.pdf

> Anyway, I hope this didn't get too long and rambling. :-)

So do I :-)

> Maybe we should have a skype meeting sometime to discuss this?

I think so. I will be away next week for vacations. So we could  
organize it around August 20/30th.

Cheers,
Quentin.


_______________________________________________
Etoile-dev mailing list
Etoile-dev@gna.org
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] Object graph Diff & Merging

Reply via email to