Le 3 août 2010 à 00:56, Eric Wasylishen a écrit : >>> Another approach is to be like git and just do a separate binary >>> diff >>> on the serialized snapshot data. I like this because it makes >>> storage >>> conceptually simpler - it's just storing whole snapshots of objects >>> but happens to delta compress related ones as an implementation >>> detail. >> How would that work with media documents? >> Suppose you work on a image that weights 300 MB and several commits >> per minute have to be done. User changes to record might even >> happen at very short interval (one second or two). > > In my opinion, regardless of how CO is implemented, for photo/video > editing what we have to keep persistent and versioned is the tree of > drawing operations/filters that we discussed a bit already, rather > than keeping the resulting bitmap data persistent and versioned. > > Two reasons: > - you need the tree structure to do merge/selective undo, > - and saving every snapshot would obviously eat disk space too fast > (even with multi-terrabyte drives) :-)
Yes. My point was that additional operations are easier to support with message-based persistency. You add a new method while in the CoreObject reimplementation you have to define a new class I think. For example… How would you express operations like 'blur' an image area or 'cut a range' in a movie clip with a state-based CoreObject? I suppose new operation subclasses would be added to express and save them in the history graph? > It might make sense to cache the bitmap data, but since it can be > regenerated given the tree of drawing operations/filters, it > probably doesn't make sense to keep old versions of the bitmap data > given their potential size. Agreed. To comment a bit on the CoreObject reimplementation… This makes me think that the need to cautious with the behavior and arguments of messages that trigger persistency in the existing CoreObject is now to shifted to the class that expresses the operation. The advantage of your approach is that it automatically reduces similar operations to a canonical operation (-removePerson:, - addObject, -addWhateverAndPray: are automatically reduced to - addObject:, -setValue:forProperty:, -removeObject:atIndex: etc.) and this makes merging much easier and serialization safer. My main concerns are I'm not sure it really solves some things what we want solve: - more transparent persistency (no explicit commit or database connection management) - store arbitrary objects (EtoileSerialize) or integrate foreign object-model (COProxy) The most problematic point would be the impossibility to add persistency to EtoileUI, because persistent objects must be COObject and store all their datas in a dictionary (no ivars). From your perspective iirc, to support the extra things I outline above introduces too much complexity. In the current state of EtoileSerialize and CoreObject, I fully agree. Although message-based persistency is not the panacea it appears to be at first sight (e.g. it tends to favor big façade objects rather than fine-grained objects with a clear role, and requires to be very cautious with the behavior and arguments of messages that trigger persistency), I still think it's a good approach because it's operation-based rather than state-based and it also gives more flexibility than a single model class. >> That's not truly related, but ideally I'd like to have several >> "undo tracks". For example, multiple tracks would be: >> - the document or object I'm working on >> - the app-level or work context >> - the library the object belongs to >> - the overall UI (would record almost all other UI actions) >> The last track would let me undo a window close or move. I'm not >> sure this last track is a realistic idea… Undoing a shutdown is >> hard ;-) Well various cases would be hard to undo or even record I >> think. >> >> Presently this track notion is only partially related to object >> contexts in CoreObject, that's why I'm planning to rework >> COObjectContext into something closer to that. > > I agree; we'll really need the undo tracks feature. > > One way I could see implementing this in my ObjectMerging project is > by attaching metadata to the COHistoryGraphNode for each commit, > like this: > {document-uuid: XXX > app-uuid: YYY > library-uuid: ZZZ, > .... (maybe other tracks) .... } Yes, that should work. > Then, supposing you want to do an undo/redo action for a particular > document, you first filtering the overall history graph to get only > the nodes with the correct document-uuid tag. The filtered history > graph is then used to figure out which changes to undo at each step. Right. In fact, that's what CoreObject does already when an object context is restored to a past version. And this can already be leveraged at the core object granularity level too. > Since the nodes in the filtered history graph likely won't be > adjacent in the overall history graph, Right, but what matters to undo/redo in a single core object is whether they are adjacent in this core object history rather than in the entire core object graph history (aka overall history graph). If the track records every message sent to a given core object and just consists of the combined histories of several core objects, the nodes would be adjacent at the persistent root granularity (exactly as it the case with a COObjectContext history currently). To create non-adjcent nodes, the track would have to select which messages it logs based on a predicate. It sounds like an interesting feature, but that's not what I was thinking about. What I was suggesting is just the possibility to have core objects that belongs to multiple object contexts at the same time rather than a single one. For exampe, when a message that triggers persistency is sent to a core object, each track to which the object belongs to log the message. Well in reality, the track uuids would be attached to the object revision/message in the metadata db. > undoing them will involve selective undo, which means merge > conflicts could occur- Yes, if the recorded messages are selected based on a predicate, no otherwise I think. > but I think this is okay and probably unavoidable when you have > multiple undo tracks. In some advanced cases, probably yes. > btw, what do you think of my idea of modeling the history graph > using the COHistoryGraphNode class? From an implementation viewpoint, I don't think it's really needed, we could just store the same data by improving/extending the current history table in the metadata db. Then it's easy to query the history in various ways or leverage the history to run other queries related to the indexed content/properties. For building a UI that lets you browse the history, a class like that makes the versioning model explicit is nice. But I would rather write it as thin layer around a query result. >>>> You also say selective undo support is planned but you don't >>>> explain >>>> how… ? >>> >>> I think you can implement it pretty easily as a merge. Here's my >>> current >>> idea: >>> >>> Suppose these are nodes in a history graph, and the current >>> revision is E. >>> >>> A---B---C---D---E >>> >>> The user wants to undo the changes made in revision B. >>> >>> What we could do is create a branch of B in which all edits made >>> in B are >>> undone; i.e. it's the same state as in history graph node A, so >>> call it A'. >>> >>> A---B---C---D---E >>> \_A' >>> >>> Then just merge E and A' - I think this will be the same as a >>> selective undo. >>> You'll get merge conflicts if there were any changes in C, D, or E >>> to which >>> overlap with the changes being undone in B, but this is exactly >>> what you >>> want. I haven't tried it yet though - could be that I'm missing >>> some detail >>> and this is nonsense :-) >> >> Sounds like an interesting approach. >> If C, D or E don't rely on the B state in any special way, this >> should work. >> I mean, if B involves a state change expected by C, D and E… This >> state change must result into an overlap conflict, otherwise things >> could break. > > Right, the merge algorithm should correctly flag that as a conflict. > >> btw have you taken a look at GINA which is mentioned in the >> Flexible Object Merging Framework paper? From what I read, it uses >> a command log very similar to CoreObject, and seems to support >> merging several message/command histories, this sounded very >> similar to what CoreObject intends to do. > > I had a look at the GINA paper (T Berlage, A Genau. "A framework for > shared applications with a replicated architecture"). They are > using the Command Pattern, so you have to write a class for each > operation which can modify document state. The command classes have > methods like selectiveUndo, selectiveRedo, canSelectiveUndo, > canSelectiveRedo. To merge two lists of commands, they don't do any > transformations on them; they just concatenate the lists of commands. > > I also re-read the selective undo paper I mentioned ("A Framework > for Undoing Actions in Collaborative Systems'' by Prakash and > Knister - link: > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.4793&rep=rep1&type=pdf > > ) Everyone interested in selective undo/merging should check this > out, I think it's a really good paper :-). > > They describe a general theory of how you can selectively undo/redo/ > merge operations. You need to define three functions: > > inverse(op) -> op' -- such that op' does the opposite of op. > transpose(op1, op2) -> (op2', op1') --- such that applying op1 > followed by op2 has the same effect as applying op2' followed by > op1'. This is the central opertaion in merging. > conflicts?(op1, op2) -> bool --- true if op1 and op2 conflict. (as > defined in the paper.) Sounds interesting. I'll take a look at that. For the papers you mention in the CoreObject reimplementation, both were really good. The merge matrix is an interesting idea, I wouldn't present that to the user, but it could be a nice way to represent the merge settings at the developer-level. I also liked the possibility to specify the merging node granularity (e.g. for a text document: word, line, paragraph etc.) and the user priority per node. > Looking at GINA from this viewpoint, the programmer writing the > command classes has to define inverse() and conflicts(), but since > you can't specify a transpose function in GINA, your command objects > have to be able to be reordered without modifying them. This makes > it tricky or impossible to write commands which modify arrays, > because you can't store array indices in your command objects since > they will become invalid if the array is changed before the command > is executed. So I'm not sure if GINA really offers any interesting > solutions. hm ok. Do you suggest being able to adjust the array indice per command based on which commands are skipped while replaying the history would solve the problem? I found this paper (I haven't read it yet) that seems to adjust old commands to support selective undo: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.755&rep=rep1&type=pdf I had the impression GINA relied on fine-grained command/message which can give better merging results and better feedback for conflicts than more coarse-grained commits. But that was pure speculation I admit :-) > To return to CoreObject's implementation, I'm concerned that merge/ > selective undo isn't possible with pure message-based persistency. > In order to support merging/selective undo, we need to be able to > tell if operations conflict, get their inverse, and transpose groups > of them. It's easy to define these for a small set of basic > operations like setValue:forProperty:, > insertObject:atIndex:ofProperty:, removeObject:atIndex:ofProperty:, > etc., which is more or less what I did in ObjectMerging. > > But If the message log contains high level messages like "- > refactorMethod:to:" or "-indentParagraph:" without any other > information, you can't really do selective undo/merge. The only > practical way I see of defining the inverse/transpose/conflicts > functions on these high level operations is to record the high-level > operations as a bunch of the primitive ones for which we already > have inverse/transpose/conflicts defined. > > Then your message log looks something like this: > > {object: UUID1 recordMessage: -[setValue: 'abc' forProperty: 'bar']}, > {object: UUID1 recordMessage: -[setValue: 'def' forProperty: 'bar']}, > {object: UUID2 recordHighLevelMessage: -[refactorMethod: 'foo' to: > 'bar'] definition: ( > {object: UUID2 message: -[setValue: 'bar' forProperty: 'name']}, > {object: UUID3 message: -[setValue: 'bar' forProperty: 'name']} > )} Sounds good. Excep that UUID3 must not be a core object, but just a path inside the core object UUID2, otherwise you get a side-effect that prevents the deterministic replay. > Now, this is really close to what I'm doing in ObjectMerging, except > the groupings of low-level changes to record are indicated by doing > a 'commit'. But it also no longer really looks like message-based > persistency, because you're recording the state change.. I'm not sure to get your point. For a method or operation, I would say the arguments encodes how the state will change. You don't record the object state but just some additional metadatas/messages which could even be represented as arguments I think. In the end, it is still operation/message-based. For example… objectUUID2 refactorMethod: foo' to: 'bar' { // Here -record would ask CoreObject to append the invocations to the serialized -refactorMethod:to: invocation // Alternatively we could handle that in a more implicit way by recording every basic persistency messages invoked until the method returns [[objectUUID2 record] setValue: 'bar' forProperty: 'name'];y // With objectUUID3 which is not a core object but a uniquely identified object inside the object graph owned by the core object (UUID2) [[objectUUID3 record] setValue: 'bar' forProperty: 'name']; } can be rewritten as below: objectUUID2 refactorMethod: 'foo' to: 'bar' setValue: bar forProperty: on: objectUUID2 setValue: bar forProperty: name on: objectUUID3 { [objectUUID2 setValue: bar forProperty: name]; [objectUUID3 setValue: bar forProperty: name]; } This new method would be the one that triggers persistency instead of - refactorMethod:to: that would just call it. This way you don't have to record the intermediate -setValue:forProperty:, they get encoded in the recorded message itself. Does that make sense or am I completely off? > What do you think? I have to write some use cases on the paper and think about them :-) I probably need to read some extra papers on the topic too. It's a really tricky problem and how to integrate that cleanly without too much complexity in the entire stack from EtoileSerialized to EtoileUI hurts my brain ;-) I found some other papers that could potential interest us: - A document mark based on method supporting group undo http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.8003 - The Multi-version and Single-display Strategy in Undo Scheme (looks like an updated version of the previous one) http://jmyang.info/papers/cit_2005_undo.pdf http://jmyang.info/slides/cit_2005_undo.ppt (slides) - Consistency Maintenance Based on the Mark & Retrace Technique in Groupware Systems (yet another more updated version) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.9726 http://jmyang.info/slides/group_2005_markretrace.ppt (slides) - Undo Any Operation at Any Time in Group Editors http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.6266 - Undoing Any Operation in Collaborative Graphics Editing Systems http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5993 - A Flexible Undo Framework for Collaborative Editing http://hal.inria.fr/index.php?halsid=f54l7h2e5149op3i00vq8daid5&view_this_doc=inria-00275754&version=2 - A flexible multi-mode undo mechanism for a collaborative modeling environment http://portal.acm.org/citation.cfm?id=1813978 - A Temporal Model for Multi-Level Undo and Redo http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.8107 - A Selective Undo Mechanism for Graphical User Interfaces Based On Command Objects (the one I mentioned earlier) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.755 - Reusable Hierarchical Commands (sounds similar to what you suggest to support selective undo with message-based persistency) http://portal.acm.org/citation.cfm?id=238386.238526&type=series - Object-based nonlinear undo model http://www.computer.org/portal/web/csdl/doi/10.1109/CMPSAC.1997.624739 More papers about collaborative editing on tree-structured documents: - Operation-based versus State-based Merging in Asynchronous Graphical Collaborative Editing http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCEW04.pdf - Multi-level Editing of Hierarchical Documents http://www.springerlink.com/content/472w061011830726/ - Tree-based model algorithm for maintaining consistency in real-time collaborative editing systems http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCEW02.pdf - Draw-Together: Graphical Editor for Collaborative Drawing http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCSCW06.pdf - Maintaining Consistency in Collaboration over Hierarchical Documents (the thesis that relates to these previous papers) http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatPhDThesis06.pdf > Anyway, I hope this didn't get too long and rambling. :-) So do I :-) > Maybe we should have a skype meeting sometime to discuss this? I think so. I will be away next week for vacations. So we could organize it around August 20/30th. Cheers, Quentin. _______________________________________________ Etoile-dev mailing list Etoile-dev@gna.org https://mail.gna.org/listinfo/etoile-dev