On 23/07/15 14:18, [email protected] wrote:
After a longish conversation with Andy Seaborne, I've worked up a simple
journaling DatasetGraph wrapping implementation. The idea is to use journaling
to support proper aborting behavior (which I believe this code does) and to add
to that a semantic for DatasetGraph::addGraph that copies tuples instead of
leaving a reference to the added Graph (which I believe this code also does).
Between these two behaviors, the idea is to be able to support transactionality
(MRSW only) reasonably well.
The idea is (if this code looks like a reasonable direction) to move onwards to an
implementation that uses persistent data structures for covering indexes in order to get
at least to MR+SW and eventually to attack JENA-624: "Develop a new in-memory RDF
Dataset implementation".
Feedback / advice / criticism greedily desired and welcome!
https://github.com/ajs6f/jena/tree/JournalingDatasetgraph
https://github.com/apache/jena/compare/master...ajs6f:JournalingDatasetgraph
---
A. Soroka
The University of Virginia Library
Hi there,
A first look - there's quite a lot to do with the release at the moment.
Having a separate set of functionality to the underlying DatasetGraph is
good for the MRSW case and with that composition on multiple datasets,
text indexes etc etc.
For the MR+SW, I think the more connected nature of transactions and
implementation might make it harder to have independent functionality
but we'll see.
https://github.com/afs/mantis/tree/master/dboe-transaction
is a take on a trasnaction mechanism. I'm using it at the moment so I'm
finding otu what works ... and what does not.
Yes - addGraph ought to be a copy. The general dataset where the app
can put together a collection of different graph types is the exception
but needed for the case of some graphs being inference, maybe some not.
One of the things that strikes me is that extending Quad to be a
QuadOperation breaks being a Quad. It adds functionality a quad does
not have. Two quads are equal if they have the same G/S/P/O and that's
not true for QuadOperation.
An operation is a pair - the action and the data - not data.
e.g. Putting a QuadOperation into a DatasetGraph would cause problems.
ListBackedOperationRecord<OpType> extends ReversibleOperationRecord<OpType>
[[
public class ListBackedOperationRecord<OpType extends
InvertibleOperation<?, ?, ?, ?>>
implements ReversibleOperationRecord<OpType> {
]]
while, yes, a collection of operations could be an operation, datasets
don't provide such composite operations so the abstraction is not used.
And the reverse of it would be recursive - each operation needs reversing.
I'd keep log (= list of operations) as a separate concept from the
operations themselves. One key operation of a ListBackedOperationRecord
is clear and Operations are
Or this is a naming thing, is "record" the log entry or the log itself?
Is there some specific reason as to why you override the
DatasetGraphWithLock lock?
My take on this is:
https://github.com/afs/jena-workspace/tree/master/src/main/java/transdsg
One difference is the notion of reversing an operation is not a feature
of the operation itself, it's the way it is played back. Partially,
this is efficiency (which may not matter) as it reduces the object churn
but also it puts undo-playback in one place (e.g. reading and writing
from storage, which might be non-heap memory, or a compacted form (or
even a disk) for where large+long transactions even on in-memory lead to
excessive object use. Just an idea.
Andy