On 23/07/15 14:18, [email protected] wrote:
After a longish conversation with Andy Seaborne, I've worked up a simple 
journaling DatasetGraph wrapping implementation. The idea is to use journaling 
to support proper aborting behavior (which I believe this code does) and to add 
to that a semantic for DatasetGraph::addGraph that copies tuples instead of 
leaving a reference to the added Graph (which I believe this code also does). 
Between these two behaviors, the idea is to be able to support transactionality 
(MRSW only) reasonably well.

The idea is (if this code looks like a reasonable direction) to move onwards to an 
implementation that uses persistent data structures for covering indexes in order to get 
at least to MR+SW and eventually to attack JENA-624: "Develop a new in-memory RDF 
Dataset implementation".

Feedback / advice / criticism greedily desired and welcome!

https://github.com/ajs6f/jena/tree/JournalingDatasetgraph

https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph

---
A. Soroka
The University of Virginia Library


Hi there,

A first look - there's quite a lot to do with the release at the moment.

Having a separate set of functionality to the underlying DatasetGraph is good for the MRSW case and with that composition on multiple datasets, text indexes etc etc.

For the MR+SW, I think the more connected nature of transactions and implementation might make it harder to have independent functionality but we'll see.

https://github.com/afs/mantis/tree/master/dboe-transaction
is a take on a trasnaction mechanism. I'm using it at the moment so I'm finding otu what works ... and what does not.


Yes - addGraph ought to be a copy. The general dataset where the app can put together a collection of different graph types is the exception but needed for the case of some graphs being inference, maybe some not.


One of the things that strikes me is that extending Quad to be a QuadOperation breaks being a Quad. It adds functionality a quad does not have. Two quads are equal if they have the same G/S/P/O and that's not true for QuadOperation.

An operation is a pair - the action and the data - not data.

e.g. Putting a QuadOperation into a DatasetGraph would cause problems.


ListBackedOperationRecord<OpType> extends ReversibleOperationRecord<OpType>

[[
public class ListBackedOperationRecord<OpType extends InvertibleOperation<?, ?, ?, ?>>
                implements ReversibleOperationRecord<OpType> {
]]


while, yes, a collection of operations could be an operation, datasets don't provide such composite operations so the abstraction is not used. And the reverse of it would be recursive - each operation needs reversing.

I'd keep log (= list of operations) as a separate concept from the operations themselves. One key operation of a ListBackedOperationRecord is clear and Operations are

Or this is a naming thing, is "record" the log entry or the log itself?


Is there some specific reason as to why you override the DatasetGraphWithLock lock?


My take on this is:

https://github.com/afs/jena-workspace/tree/master/src/main/java/transdsg

One difference is the notion of reversing an operation is not a feature of the operation itself, it's the way it is played back. Partially, this is efficiency (which may not matter) as it reduces the object churn but also it puts undo-playback in one place (e.g. reading and writing from storage, which might be non-heap memory, or a compacted form (or even a disk) for where large+long transactions even on in-memory lead to excessive object use. Just an idea.

        Andy

Reply via email to