Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

[email protected] Thu, 27 Aug 2015 08:55:31 -0700

Andy-- Thanks, these comments are really helpful! I've replied in-line in a few 
places to clarify or answer questions, or ask some of my own. {grin}

---
A. Soroka
The University of Virginia Library

On Aug 27, 2015, at 5:35 AM, Andy Seaborne <[email protected]> wrote:

>> 1) All-transactional action: i.e. no non-transactional operation. This is 
>> obviously a great thing for simplifying my work, but I hope it won't be out 
>> of line with the expected uses for this stuff.
> You could add an auto-commit feature so that any update outside a transaction 
> has a transaction wrapper applied.  Feature.

I can and will.

>> 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I 
>> figure to play to the strength of in-core-memory operation: raw speed, but 
>> obviously this is going to cost space.
> There are choices :-) esp in memory are the datastructure kind can chnage on 
> the way down. e.g. have a hash map for GS->PO and keep the PO tightly packed 
> (a few for each S) and scan them.

Very true. Ideally, I would like to offer some "knobs" to users to choose their 
own balance between speed and space. I'll step back and consider a few more 
designs before going further with this six-way approach.

> Is that going to 6 pcollections datastructres or all held in one 
> datastructres (c.f. BerkeleyDB)?

Right now I am looking at a setup with six independent indexes addressed 
through a single class, but that was just the first thing that came to mind 
that seemed reasonable. I am not committed to that by any means. If I step away 
from the six-way, the question changes.

>> 3) At least for now, all commits succeed.
>> 4) The use of persistent datastructures to avoid complex and error-prone 
>> fine-grained locking regimes. I'm using http://pcollections.org/ for now, 
>> but I am in no way committed to it nor do I claim to have thoroughly vetted 
>> it. It's simple but enough to get started, and that's all I need to bring 
>> the real design questions into focus.
> Is a consequence that there is one truly active writer (and many readers)?

Something like that. If you look at the scheme of operation below, all writers 
are invisible to each other  and can write at will, but only one writer can 
commit at a time. That may very well not be enough concurrency, but it's just a 
starting place.

> If there are multiple writers, then (1) system aborts will always be possible 
> (conflicting updates) and (2) locking on datastructres is necessary ... or 
> timestamps and vector clocks or some such.

Right, see below. Again, there are multiple writers, but they only see 
themselves, and only one committer. "Only one committer at a time" prevents 
conflicts, since there is no schema to violate, but it is a brutal way to deal 
with the problem. And the "re-run" scheme of operation means it will be a very 
real bottleneck.

>> 5) Snapshot isolation. Transactions do not see commits that occur during 
>> their lifetime. Each works entirely from the state of the DatasetGraph at 
>> the start of its life.
> But they see their own updates presumably?

Right, that's exactly the purpose of taking off their own reference to the 
persistent datastructures at the start of the transaction. They "evolve" their 
datastructures independently.

>> 6) Only as many as one transaction per thread, for now. Transactions are not 
>> thread-safe. These are simplifying assumptions that could be relaxed later.
> 
> TDB ended up there as well.  There is, internally, a transaction object but 
> it's held in a ThreadLocal and fetched when needed.  Otherwise a lot of 
> interface need a "transaction" parameter and its hard to reuse other code 
> that does pass it through.

That's close to what I sketched out.

> I have taken a second take on transactions with TDB2.  This module is an 
> independent transactions system, unlike TDB1 where it's TDB1-specific.
> https://github.com/afs/mantis/tree/master/dboe-transaction
> It needs documentation for use on its own but I have used in in another 
> project to coordinate distributed transactions. (dboe = database operating 
> environment)

I need to study this more. Obviously, if I can take over some of your work, 
that would be ideal. 

>> My current design operates as follows: <snipped>
> Looks good.  I don't quite understand the need to record and rerun though - 
> isn't the power of pcollections that there can be old and new roots to the 
> datastructures and commit is swap to new one, abort is forget the new one.

Yeah, but my worry (perhaps just my misunderstanding) is over transactions 
interacting badly in the presence of snapshot isolation. Let's say we did use 
the technique of atomic swap, and consider the following scenario:

T=-1  The committed datastructures contain triples T.
T=0   Transaction 1 begins, taking a reference to the datastructures
T=1   Transaction 2 begins, taking its own reference to the datastructures
T=3   Transaction 1 does some updates, adding some triples T_1 to its own 
"branch", resulting in T+T_1.
T=4   Transaction 2 does some updates, adding some triples T_2 to its own 
"branch", resulting in T+T_2. 
T=5   Transaction 1 commits, so that the committed triples are now T + T_1.
T=6   Transaction 2 commits, so that the committed triples are now T + T_2.

We lost Transaction 1's T_1 triples. I think this technique actually requires 
_merge_ instead of swap, either merge-into-open-transactions (after a commit) 
which isn't snapshot isolation or merge-into-commit (instead of 
swap-into-commit). But there's plenty of chance that I'm just misunderstanding 
this whole thing. {grin} I have not designed a transaction system over 
persistent datastructures before, and I welcome correction. I also need to 
research more about persistent datastructures with merge capability.

> TDB2 uses persistent datastructures (which in its case can by persistent on 
> disk). When a transaction starts, it gets the reference to the root of the 
> B+Tree.
> 
> When a write happen, the path from the place in the tree for the update to 
> the root is duplicated.  There is a new root - the writer has a new root, all 
> readers have the old root so they have complete isolation. Readers can find 
> newly written blocks. A difference from pcollections, if I understand it 
> correctly, is that blocks are only duplicated once then updated in-place. 
> Blocks are only ever added the block file above the writer-start water-mark 
> so the test is whether a block is above or below the write-transaction start 
> water mark.

Clojure's persistent datastructures have a capability somewhat similar to this 
(mutate in place for efficiency):

http://clojure.org/transients

but I found that using them from Java was pretty tricky. Doesn't mean that 
might not be the right library to use.

> Preparing the commit is sync the file, don't write the new root to disk, do 
> write the id of the root block to the transaction journal.  Repeat for all 
> indexes, write commit end {*}, sync.  {*} is the true commit point.  
> Afterwards, and on journal replay, the new root id is written to the disk. 
> Abort is truncate the file back to the start of the writer.  Nice feature - 
> the bulk work of the writer is done during the transaction and the OS can be 
> writing to disk all the time as it see fit. No bulk byte going though the 
> journal. When you pcollections PMap::put it returns a new PMap reference so 
> it look like the same idea.

I think it is, although in my current scheme you are getting all the bytes 
moving around during commit. I admit that that's definitely not optimal. I 
would like to do something more like what we discuss above (swap or merge 
instead of rerun).

> You will probably need a short lock in two places but they are short and not 
> going to affect throughput, only developer sanity.
> When a transaction starts, then the system needs to set it out, set any 
> thread local variables, and do any housekeeping. I found getting a consistent 
> set of all the internal state variables needs a lock. An ultra minimal system 
> might be able to get away with no lock but if it's got counters or any 
> transaction tracking datastructures, then there are multiple things to read. 
> Paranoia about overlapping threads suggests needing a short period to take 
> all the readings.

Yes, I think you are right. I do think that I might be able to get away with an 
AtomicReference or AtomicStampedReference there if I can avoid too much 
"baggage".

> The second point is on commit/abort for the same reason and also writing the 
> journal atomically (which you may not need).

Right, but I think that for commit that bullet is one I have already bit. 
{grin} Or are you saying that an AtomicReference won't provide enough safety 
there and commit deserves explicit locking? And for abort, I don't see what a 
lock would protect… but maybe I am misunderstanding what you mean by "journal" 
in this context?

Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

Reply via email to