Re: Journaling DatasetGraph

[email protected] Mon, 27 Jul 2015 11:30:52 -0700

> One of the things that strikes me is that extending Quad to be a 
> QuadOperation breaks being a Quad.  It adds functionality a quad does not 
> have.  Two quads are equal if they have the same G/S/P/O and that's not true 
> for QuadOperation.
> An operation is a pair - the action and the data - not data. e.g. Putting a 
> QuadOperation into a DatasetGraph would cause problems.


Andy-- I've thought harder about this and I've realized that whether or not I 
can make a navel-gazing argument about correctness, the typing is obviously 
confusing and that's damnation enough. I'll fix this to stop extending Quad.

---
A. Soroka
The University of Virginia Library

On Jul 25, 2015, at 7:43 AM, Andy Seaborne <[email protected]> wrote:

> On 23/07/15 14:18, [email protected] wrote:
>> After a longish conversation with Andy Seaborne, I've worked up a simple 
>> journaling DatasetGraph wrapping implementation. The idea is to use 
>> journaling to support proper aborting behavior (which I believe this code 
>> does) and to add to that a semantic for DatasetGraph::addGraph that copies 
>> tuples instead of leaving a reference to the added Graph (which I believe 
>> this code also does). Between these two behaviors, the idea is to be able to 
>> support transactionality (MRSW only) reasonably well.
>> 
>> The idea is (if this code looks like a reasonable direction) to move onwards 
>> to an implementation that uses persistent data structures for covering 
>> indexes in order to get at least to MR+SW and eventually to attack JENA-624: 
>> "Develop a new in-memory RDF Dataset implementation".
>> 
>> Feedback / advice / criticism greedily desired and welcome!
>> 
>> https://github.com/ajs6f/jena/tree/JournalingDatasetgraph
>> 
>> https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
> 
> Hi there,
> 
> A first look - there's quite a lot to do with the release at the moment.
> 
> Having a separate set of functionality to the underlying DatasetGraph is good 
> for the MRSW case and with that composition on multiple datasets, text 
> indexes etc etc.
> 
> For the MR+SW, I think the more connected nature of transactions and 
> implementation might make it harder to have independent functionality but 
> we'll see.
> 
> https://github.com/afs/mantis/tree/master/dboe-transaction
> is a take on a trasnaction mechanism.  I'm using it at the moment so I'm 
> finding otu what works ... and what does not.
> 
> 
> Yes - addGraph ought to be a copy.  The general dataset where the app can put 
> together a collection of different graph types is the exception but needed 
> for the case of some graphs being inference, maybe some not.
> 
> 
> One of the things that strikes me is that extending Quad to be a 
> QuadOperation breaks being a Quad.  It adds functionality a quad does not 
> have.  Two quads are equal if they have the same G/S/P/O and that's not true 
> for QuadOperation.
> 
> An operation is a pair - the action and the data - not data.
> 
> e.g. Putting a QuadOperation into a DatasetGraph would cause problems.
> 
> 
> ListBackedOperationRecord<OpType> extends ReversibleOperationRecord<OpType>
> 
> [[
> public class ListBackedOperationRecord<OpType extends InvertibleOperation<?, 
> ?, ?, ?>>
>               implements ReversibleOperationRecord<OpType> {
> ]]
> 
> 
> while, yes, a collection of operations could be an operation, datasets don't 
> provide such composite operations so the abstraction is not used.  And the 
> reverse of it would be recursive - each operation needs reversing.
> 
> I'd keep log (= list of operations) as a separate concept from the operations 
> themselves.  One key operation of a ListBackedOperationRecord is clear and 
> Operations are
> 
> Or this is a naming thing, is "record" the log entry or the log itself?
> 
> 
> Is there some specific reason as to why you override the DatasetGraphWithLock 
> lock?
> 
> 
> My take on this is:
> 
> https://github.com/afs/jena-workspace/tree/master/src/main/java/transdsg
> 
> One difference is the notion of reversing an operation is not a feature of 
> the operation itself, it's the way it is played back.  Partially, this is 
> efficiency (which may not matter) as it reduces the object churn but also it 
> puts undo-playback in one place (e.g. reading and writing from storage, which 
> might be non-heap memory, or a compacted form (or even a disk) for where 
> large+long transactions even on in-memory lead to excessive object use.  Just 
> an idea.
> 
>       Andy
>

Re: Journaling DatasetGraph

Reply via email to