Thanks for the feedback, Andy! See comment in-line below. --- A. Soroka The University of Virginia Library
On Jul 25, 2015, at 7:43 AM, Andy Seaborne <[email protected]> wrote: > A first look - there's quite a lot to do with the release at the moment. Right, I don't expect anyone to get around to much consideration of this until that is over. Good luck! > Having a separate set of functionality to the underlying DatasetGraph is good > for the MRSW case and with that composition on multiple datasets, text > indexes etc etc. For the MR+SW, I think the more connected nature of > transactions and implementation might make it harder to have independent > functionality but we'll see. I agree. That's why I did this as a "wrap-around". I don't think MR+SW _can_ be done that way, but we'll see⦠> Yes - addGraph ought to be a copy. The general dataset where the app can put > together a collection of different graph types is the exception but needed > for the case of some graphs being inference, maybe some not. As I wrote, I believe that my current code does this solidly and the test shows it, but I'm not sure that the impl is as efficient as possible. Suggestions welcome! > One of the things that strikes me is that extending Quad to be a > QuadOperation breaks being a Quad. It adds functionality a quad does not > have. Two quads are equal if they have the same G/S/P/O and that's not true > for QuadOperation. > An operation is a pair - the action and the data - not data. I'm not sure I understand the objection here: all classes inherit from Object and virtually all of them add functionality Object does not have and break its equality definition. I certainly understand the view on operations you're taking, but I'm proposing a different one that includes data, action (in my code, that comes in the form of type, not an enumeration, so that I can replace cases in your code with polymorphism) _and_ service type. Adding a quad to a special index might be substantially different than adding it to a dataset. > e.g. Putting a QuadOperation into a DatasetGraph would cause problems. Because of the equality question? I _think_ I understand this objection; are you saying that logic for things like DatasetGraph::contains becomes problematic? To my mind it implies a more sophisticated type of comparison (using equivalence and not equals()) instead of a different kind of data structure. I'll try to make some corrections to show what I mean and give you something to react to. I may be wrong here, but I'd like to follow out the idea. > ListBackedOperationRecord<OpType> extends ReversibleOperationRecord<OpType> > > public class ListBackedOperationRecord<OpType extends InvertibleOperation<?, > ?, ?, ?>> > implements ReversibleOperationRecord<OpType> { > > while, yes, a collection of operations could be an operation datasets don't > provide such composite operations so the abstraction is not used. And the > reverse of it would be recursive - each operation needs reversing. I am _not_ making the claim here that "a collection of operations could be an operation". A record (in my code) is just a record. It is _not_ usable as an aggregate operation and doesn't subtype Operation. There is no use of records as operations nor any intended such use, so no problem. > I'd keep log (= list of operations) as a separate concept from the operations > themselves. One key operation of a ListBackedOperationRecord is clear and > Operations are Or this is a naming thing, is "record" the log entry or the > log itself? Something seems to have been eaten out of your mail (!) but anyway, a record _is_ a separate concept from operation. There is ReversibleOperationRecord and there is Operation and the only relationship between them is that Operation is a parameter type for ReversibleOperationRecord::add and part of the parameter type for ReversibleOperationRecord::consume. As far as names, I'm not sure what you mean-- ReversibleOperationRecord the type? That's a log. It contains Operations, but _is not one itself_. > Is there some specific reason as to why you override the DatasetGraphWithLock > lock? Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin}) > One difference is the notion of reversing an operation is not a feature of > the operation itself, it's the way it is played back. Partially, this is > efficiency (which may not matter) as it reduces the object churn but also it > puts undo-playback in one place (e.g. reading and writing from storage, which > might be non-heap memory, or a compacted form (or even a disk) for where > large+long transactions even on in-memory lead to excessive object use. Just > an idea. Yeah, I intentionally separated the two (reverse an operation, reverse a series of operation) because: 1) I think the idea of the inverse of an operation is important enough that I want it visible in the type system, hence the verbose but specific signature of InvertibleOperation. I believe that I have correctly written it so that you cannot have an InvertibleOperation without a specific inverse and that inverse must declare your InvertibleOperation as _its_ inverse. 2) Moving the reversal of a single operation into a method on Operation let me use polymorphism, as mentioned above, to make the code in DatasetGraphWithRecord clear, concise, and extensible. You can have more kinds of operations that can occur inside a transaction without changing the idea of an abort: you would subclass DatasetGraphWithRecord and override _add and _delete. I should have documented that decision more carefully. 3) It is possible to imagine different ways to use journals to support transactions (for example, in the MR+SW case) and I wanted to isolate the two kinds of logic for rerunning a journal (ReversibleOperationRecord::reverse and ::consume) and the operations you do while rerunning it (found in DatasetGraphWithRecord or another type). I'm thinking of it a bit like internal vs. external iteration. I may be on the wrong track, but I do think keeping the two abstractions separate is a good thing at the start, since it's usually easier to merge abstractions than to separate them. I do perceive the risk of frothing GC, but I'd like to go forward far enough to actually see whether it is a problem before so assuming.
