Something I have been thinking about.... you could replace GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. with a single bloomfilter implementation. It means a 2 step process to find matches but it might be fast enough and reduce the overhead significantly.
I did an in-memory and a relational DB based version recently, but it was just a quick POC. Claude On Wed, Aug 26, 2015 at 3:27 PM, A. Soroka <[email protected]> wrote: > Hey, folks-- > > There hasn't been too much feedback on my proposal for a journaling > DatasetGraph: > > https://github.com/ajs6f/jena/tree/JournalingDatasetgraph > > which was and is to be a step towards JENA-624: Develop a new in-memory > RDF Dataset implementation. So I'm moving on to look at the real problem: > an in-memory DatasetGraph with high concurrency, for use with modern > hardware running many, many threads in large core memory. > > I'm beginning to sketch out rough code, and I'd like to run some design > decisions past the list to get criticism/advice/horrified warnings/whatever > needs to be said. > > 1) All-transactional action: i.e. no non-transactional operation. This is > obviously a great thing for simplifying my work, but I hope it won't be out > of line with the expected uses for this stuff. > > 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I > figure to play to the strength of in-core-memory operation: raw speed, but > obviously this is going to cost space. > > 3) At least for now, all commits succeed. > > 4) The use of persistent datastructures to avoid complex and error-prone > fine-grained locking regimes. I'm using http://pcollections.org/ for now, > but I am in no way committed to it nor do I claim to have thoroughly vetted > it. It's simple but enough to get started, and that's all I need to bring > the real design questions into focus. > > 5) Snapshot isolation. Transactions do not see commits that occur during > their lifetime. Each works entirely from the state of the DatasetGraph at > the start of its life. > > 6) Only as many as one transaction per thread, for now. Transactions are > not thread-safe. These are simplifying assumptions that could be relaxed > later. > > My current design operates as follows: > > At the start of a transaction, a fresh in-transaction reference is taken > atomically from the AtomicReference that points to the index block. As > operations are performed in the transaction, that in-transaction reference > is progressed (in the sense in which any persistent datastructure is > progressed) while the operations are recorded. Upon an abort, the > in-transaction reference and the record are just thrown away. Upon a > commit, the in-transaction reference is thrown away and the operation > record is re-run against the main reference (the one that is copied at the > beginning of a transaction). That rerun happens inside an atomic update > (hence the use of AtomicReference). This all should avoid the need for > explicit locking in Jena and should confine any blocking against the > indexes to the actual duration of a commit. > > What do you guys think? > > > > --- > A. Soroka > The University of Virginia Library > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
