Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

Andy Seaborne Thu, 27 Aug 2015 02:36:07 -0700

On 26/08/15 15:27, A. Soroka wrote:

Hey, folks--


There hasn't been too much feedback on my proposal for a journaling
DatasetGraph:

https://github.com/ajs6f/jena/tree/JournalingDatasetgraph

which was and is to be a step towards JENA-624: Develop a new
in-memory RDF Dataset implementation. So I'm moving on to look at the
real problem: an in-memory  DatasetGraph with high concurrency, for
use with modern hardware running many, many threads in large core
memory.

I'm beginning to sketch out rough code, and I'd like to run some
design decisions past the list to get criticism/advice/horrified
warnings/whatever needs to be said.

1) All-transactional action: i.e. no non-transactional operation.
This is obviously a great thing for simplifying my work, but I hope
it won't be out of line with the expected uses for this stuff.

You could add an auto-commit feature so that any update outside atransaction has a transaction wrapper applied. Feature.

2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO,
OPSG. I figure to play to the strength of in-core-memory operation:
raw speed, but obviously this is going to cost space.

There are choices :-) esp in memory are the datastructure kind canchnage on the way down. e.g. have a hash map for GS->PO and keep the POtightly packed (a few for each S) and scan them.

Is that going to 6 pcollections datastructres or all held in onedatastructres (c.f. BerkeleyDB)?


3) At least for now, all commits succeed.

4) The use of persistent datastructures to avoid complex and
error-prone fine-grained locking regimes. I'm using
http://pcollections.org/ for now, but I am in no way committed to it
nor do I claim to have thoroughly vetted it. It's simple but enough
to get started, and that's all I need to bring the real design
questions into focus.


Is a consequence that there is one truly active writer (and many readers)?

If there are multiple writers, then (1) system aborts will always bepossible (conflicting updates) and (2) locking on datastructres isnecessary ... or timestamps and vector clocks or some such.

5) Snapshot isolation. Transactions do not see commits that occur
during their lifetime. Each works entirely from the state of the
DatasetGraph at the start of its life.


But they see their own updates presumably?


6) Only as many as one transaction per thread, for now. Transactions
are not thread-safe. These are simplifying assumptions that could be
relaxed later.

TDB ended up there as well. There is, internally, a transaction objectbut it's held in a ThreadLocal and fetched when needed. Otherwise a lotof interface need a "transaction" parameter and its hard to reuse othercode that does pass it through.

I have taken a second take on transactions with TDB2. This module is anindependent transactions system, unlike TDB1 where it's TDB1-specific.


https://github.com/afs/mantis/tree/master/dboe-transaction

It needs documentation for use on its own but I have used in in anotherproject to coordinate distributed transactions.



(dboe = database operating environment)

My current design operates as follows:

At the start of a transaction, a fresh in-transaction reference is
taken atomically from the AtomicReference that points to the index
block. As operations are performed in the transaction, that
in-transaction reference is progressed (in the sense in which any
persistent datastructure is progressed) while the operations are
recorded. Upon an abort, the in-transaction reference and the record
are just thrown away. Upon a commit, the in-transaction reference is
thrown away and the operation record is re-run against the main
reference (the one that is copied at the beginning of a transaction).
That rerun happens inside an atomic update (hence the use of
AtomicReference). This all should avoid the need for explicit locking
in Jena and should confine any blocking against the indexes to the
actual duration of a commit.

Looks good. I don't quite understand the need to record and rerunthough - isn't the power of pcollections that there can be old and newroots to the datastructures and commit is swap to new one, abort isforget the new one.

TDB2 uses persistent datastructures (which in its case can by persistenton disk). When a transaction starts, it gets the reference to the rootof the B+Tree.

When a write happen, the path from the place in the tree for the updateto the root is duplicated. There is a new root - the writer has a newroot, all readers have the old root so they have complete isolation.Readers can find newly written blocks.

A difference from pcollections, if I understand it correctly, is thatblocks are only duplicated once then updated in-place. Blocks are onlyever added the block file above the writer-start water-mark so the testis whether a block is above or below the write-transaction start water mark.

Preparing the commit is sync the file, don't write the new root to disk,do write the id of the root block to the transaction journal. Repeatfor all indexes, write commit end {*}, sync. {*} is the true commitpoint. Afterwards, and on journal replay, the new root id is written tothe disk.

Abort is truncate the file back to the start of the writer. Nicefeature - the bulk work of the writer is done during the transaction andthe OS can be writing to disk all the time as it see fit. No bulk bytegoing though the journal.

When you pcollections PMap::put it returns a new PMap reference so itlook like the same idea.

You will probably need a short lock in two places but they are short andnot going to affect throughput, only developer sanity.

When a transaction starts, then the system needs to set it out, set anythread local variables, and do any housekeeping. I found getting aconsistent set of all the internal state variables needs a lock. Anultra minimal system might be able to get away with no lock but if it'sgot counters or any transaction tracking datastructures, then there aremultiple things to read. Paranoia about overlapping threads suggestsneeding a short period to take all the readings.

The second point is on commit/abort for the same reason and also writingthe journal atomically (which you may not need).


Hope that helps,

        Andy


What do you guys think?



--- A. Soroka The University of Virginia Library

Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

Reply via email to