Re: Journaling DatasetGraph

[email protected] Mon, 27 Jul 2015 10:08:20 -0700

Thanks for the feedback, Andy! See comment in-line below.

---
A. Soroka
The University of Virginia Library

On Jul 25, 2015, at 7:43 AM, Andy Seaborne <[email protected]> wrote:

> A first look - there's quite a lot to do with the release at the moment.

Right, I don't expect anyone to get around to much consideration of this until 
that is over. Good luck!

> Having a separate set of functionality to the underlying DatasetGraph is good 
> for the MRSW case and with that composition on multiple datasets, text 
> indexes etc etc. For the MR+SW, I think the more connected nature of 
> transactions and implementation might make it harder to have independent 
> functionality but we'll see.

I agree. That's why I did this as a "wrap-around". I don't think MR+SW _can_ be 
done that way, but we'll see…

> Yes - addGraph ought to be a copy.  The general dataset where the app can put 
> together a collection of different graph types is the exception but needed 
> for the case of some graphs being inference, maybe some not.

As I wrote, I believe that my current code does this solidly and the test shows 
it, but I'm not sure that the impl is as efficient as possible. Suggestions 
welcome!

> One of the things that strikes me is that extending Quad to be a 
> QuadOperation breaks being a Quad.  It adds functionality a quad does not 
> have.  Two quads are equal if they have the same G/S/P/O and that's not true 
> for QuadOperation.
> An operation is a pair - the action and the data - not data.

I'm not sure I understand the objection here: all classes inherit from Object 
and virtually all of them add functionality Object does not have and break its 
equality definition. I certainly understand the view on operations you're 
taking, but I'm proposing a different one that includes data, action (in my 
code, that comes in the form of type, not an enumeration, so that I can replace 
cases in your code with polymorphism) _and_ service type. Adding a quad to a 
special index might be substantially different than adding it to a dataset.

> e.g. Putting a QuadOperation into a DatasetGraph would cause problems.

Because of the equality question? I _think_ I understand this objection; are 
you saying that logic for things like DatasetGraph::contains becomes 
problematic? To my mind it implies a more sophisticated type of comparison 
(using equivalence and not equals()) instead of a different kind of data 
structure. I'll try to make some corrections to show what I mean and give you 
something to react to. I may be wrong here, but I'd like to follow out the idea.

> ListBackedOperationRecord<OpType> extends ReversibleOperationRecord<OpType>
> 
> public class ListBackedOperationRecord<OpType extends InvertibleOperation<?, 
> ?, ?, ?>>
>               implements ReversibleOperationRecord<OpType> {
> 
> while, yes, a collection of operations could be an operation datasets don't 
> provide such composite operations so the abstraction is not used.  And the 
> reverse of it would be recursive - each operation needs reversing.

I am _not_ making the claim here that "a collection of operations could be an 
operation". A record (in my code) is just a record. It is _not_ usable as an 
aggregate operation and doesn't subtype Operation. There is no use of records 
as operations nor any intended such use, so no problem. 

> I'd keep log (= list of operations) as a separate concept from the operations 
> themselves.  One key operation of a ListBackedOperationRecord is clear and 
> Operations are Or this is a naming thing, is "record" the log entry or the 
> log itself?

Something seems to have been eaten out of your mail (!) but anyway, a record 
_is_ a separate concept from operation. There is ReversibleOperationRecord and 
there is Operation and the only relationship between them is that Operation is 
a parameter type for ReversibleOperationRecord::add and part of the parameter 
type for ReversibleOperationRecord::consume. As far as names, I'm not sure what 
you mean-- ReversibleOperationRecord the type? That's a log. It contains 
Operations, but _is not one itself_. 

> Is there some specific reason as to why you override the DatasetGraphWithLock 
> lock?

Yes, because DatasetGraphWithLock has no Lock that I could find, and it 
inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from 
the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. 
But maybe I am misunderstanding the interaction here? (No surprise! {grin})

> One difference is the notion of reversing an operation is not a feature of 
> the operation itself, it's the way it is played back.  Partially, this is 
> efficiency (which may not matter) as it reduces the object churn but also it 
> puts undo-playback in one place (e.g. reading and writing from storage, which 
> might be non-heap memory, or a compacted form (or even a disk) for where 
> large+long transactions even on in-memory lead to excessive object use.  Just 
> an idea.

Yeah, I intentionally separated the two (reverse an operation, reverse a series 
of operation) because:

1) I think the idea of the inverse of an operation is important enough that I 
want it visible in the type system, hence the verbose but specific signature of 
InvertibleOperation. I believe that I have correctly written it so that you 
cannot have an InvertibleOperation without a specific inverse and that inverse 
must declare your InvertibleOperation as _its_ inverse.

2) Moving the reversal of a single operation into a method on Operation let me 
use polymorphism, as mentioned above, to make the code in 
DatasetGraphWithRecord clear, concise, and extensible. You can have more kinds 
of operations that can occur inside a transaction without changing the idea of 
an abort: you would subclass DatasetGraphWithRecord and override _add and 
_delete. I  should have documented that decision more carefully.

3) It is possible to imagine different ways to use journals to support 
transactions (for example, in the MR+SW case) and I wanted to isolate the two 
kinds of logic for rerunning a journal (ReversibleOperationRecord::reverse and 
::consume) and the operations you do while rerunning it (found in 
DatasetGraphWithRecord or another type). I'm thinking of it a bit like internal 
vs. external iteration.

I may be on the wrong track, but I do think keeping the two abstractions 
separate is a good thing at the start, since it's usually easier to merge 
abstractions than to separate them. I do perceive the risk of frothing GC, but 
I'd like to go forward far enough to actually see whether it is a problem 
before so assuming.

Re: Journaling DatasetGraph

Reply via email to