Re: Journaling DatasetGraph

Andy Seaborne Sat, 25 Jul 2015 04:43:50 -0700

On 23/07/15 14:18, [email protected] wrote:

After a longish conversation with Andy Seaborne, I've worked up a simple 
journaling DatasetGraph wrapping implementation. The idea is to use journaling 
to support proper aborting behavior (which I believe this code does) and to add 
to that a semantic for DatasetGraph::addGraph that copies tuples instead of 
leaving a reference to the added Graph (which I believe this code also does). 
Between these two behaviors, the idea is to be able to support transactionality 
(MRSW only) reasonably well.


The idea is (if this code looks like a reasonable direction) to move onwards to an 
implementation that uses persistent data structures for covering indexes in order to get 
at least to MR+SW and eventually to attack JENA-624: "Develop a new in-memory RDF 
Dataset implementation".

Feedback / advice / criticism greedily desired and welcome!

https://github.com/ajs6f/jena/tree/JournalingDatasetgraph

https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph

---
A. Soroka
The University of Virginia Library


Hi there,

A first look - there's quite a lot to do with the release at the moment.

Having a separate set of functionality to the underlying DatasetGraph isgood for the MRSW case and with that composition on multiple datasets,text indexes etc etc.

For the MR+SW, I think the more connected nature of transactions andimplementation might make it harder to have independent functionalitybut we'll see.


https://github.com/afs/mantis/tree/master/dboe-transaction

is a take on a trasnaction mechanism. I'm using it at the moment so I'mfinding otu what works ... and what does not.

Yes - addGraph ought to be a copy. The general dataset where the appcan put together a collection of different graph types is the exceptionbut needed for the case of some graphs being inference, maybe some not.

One of the things that strikes me is that extending Quad to be aQuadOperation breaks being a Quad. It adds functionality a quad doesnot have. Two quads are equal if they have the same G/S/P/O and that'snot true for QuadOperation.


An operation is a pair - the action and the data - not data.

e.g. Putting a QuadOperation into a DatasetGraph would cause problems.


ListBackedOperationRecord<OpType> extends ReversibleOperationRecord<OpType>

[[

public class ListBackedOperationRecord<OpType extendsInvertibleOperation<?, ?, ?, ?>>

                implements ReversibleOperationRecord<OpType> {
]]

while, yes, a collection of operations could be an operation, datasetsdon't provide such composite operations so the abstraction is not used.And the reverse of it would be recursive - each operation needs reversing.

I'd keep log (= list of operations) as a separate concept from theoperations themselves. One key operation of a ListBackedOperationRecordis clear and Operations are


Or this is a naming thing, is "record" the log entry or the log itself?

Is there some specific reason as to why you override theDatasetGraphWithLock lock?



My take on this is:

https://github.com/afs/jena-workspace/tree/master/src/main/java/transdsg

One difference is the notion of reversing an operation is not a featureof the operation itself, it's the way it is played back. Partially,this is efficiency (which may not matter) as it reduces the object churnbut also it puts undo-playback in one place (e.g. reading and writingfrom storage, which might be non-heap memory, or a compacted form (oreven a disk) for where large+long transactions even on in-memory lead toexcessive object use. Just an idea.


        Andy

Re: Journaling DatasetGraph

Reply via email to