Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

Claude Warren Sat, 29 Aug 2015 06:57:06 -0700

Something I have been thinking about....

you could replace  GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. with a single
bloomfilter implementation.  It means a 2 step process to find matches but
it might be fast enough and reduce the overhead significantly.


I did an in-memory and a relational DB based version recently, but it was
just a quick POC.

Claude

On Wed, Aug 26, 2015 at 3:27 PM, A. Soroka <[email protected]> wrote:

> Hey, folks--
>
> There hasn't been too much feedback on my proposal for a journaling
> DatasetGraph:
>
> https://github.com/ajs6f/jena/tree/JournalingDatasetgraph
>
> which was and is to be a step towards JENA-624: Develop a new in-memory
> RDF Dataset implementation. So I'm moving on to look at the real problem:
> an in-memory  DatasetGraph with high concurrency, for use with modern
> hardware running many, many threads in large core memory.
>
> I'm beginning to sketch out rough code, and I'd like to run some design
> decisions past the list to get criticism/advice/horrified warnings/whatever
> needs to be said.
>
> 1) All-transactional action: i.e. no non-transactional operation. This is
> obviously a great thing for simplifying my work, but I hope it won't be out
> of line with the expected uses for this stuff.
>
> 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I
> figure to play to the strength of in-core-memory operation: raw speed, but
> obviously this is going to cost space.
>
> 3) At least for now, all commits succeed.
>
> 4) The use of persistent datastructures to avoid complex and error-prone
> fine-grained locking regimes. I'm using http://pcollections.org/ for now,
> but I am in no way committed to it nor do I claim to have thoroughly vetted
> it. It's simple but enough to get started, and that's all I need to bring
> the real design questions into focus.
>
> 5) Snapshot isolation. Transactions do not see commits that occur during
> their lifetime. Each works entirely from the state of the DatasetGraph at
> the start of its life.
>
> 6) Only as many as one transaction per thread, for now. Transactions are
> not thread-safe. These are simplifying assumptions that could be relaxed
> later.
>
> My current design operates as follows:
>
> At the start of a transaction, a fresh in-transaction reference is taken
> atomically from the AtomicReference that points to the index block. As
> operations are performed in the transaction, that in-transaction reference
> is progressed (in the sense in which any persistent datastructure is
> progressed) while the operations are recorded. Upon an abort, the
> in-transaction reference and the record are just thrown away. Upon a
> commit, the in-transaction reference is thrown away and the operation
> record is re-run against the main reference (the one that is copied at the
> beginning of a transaction). That rerun happens inside an atomic update
> (hence the use of AtomicReference). This all should avoid the need for
> explicit locking in Jena and should confine any blocking against the
> indexes to the actual duration of a commit.
>
> What do you guys think?
>
>
>
> ---
> A. Soroka
> The University of Virginia Library
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

Reply via email to