Thanks for the feedback! I can see how one Bloom filter could be used with an accompanying structure to replace one of the indexes, but I don't quite see how one could replace all of them-- can you elaborate?
--- A. Soroka The University of Virginia Library On Aug 29, 2015, at 9:55 AM, Claude Warren <[email protected]> wrote: > Something I have been thinking about.... > > you could replace GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. with a single > bloomfilter implementation. It means a 2 step process to find matches but > it might be fast enough and reduce the overhead significantly. > > I did an in-memory and a relational DB based version recently, but it was > just a quick POC. > > Claude > > On Wed, Aug 26, 2015 at 3:27 PM, A. Soroka <[email protected]> wrote: > >> Hey, folks-- >> >> There hasn't been too much feedback on my proposal for a journaling >> DatasetGraph: >> >> https://github.com/ajs6f/jena/tree/JournalingDatasetgraph >> >> which was and is to be a step towards JENA-624: Develop a new in-memory >> RDF Dataset implementation. So I'm moving on to look at the real problem: >> an in-memory DatasetGraph with high concurrency, for use with modern >> hardware running many, many threads in large core memory. >> >> I'm beginning to sketch out rough code, and I'd like to run some design >> decisions past the list to get criticism/advice/horrified warnings/whatever >> needs to be said. >> >> 1) All-transactional action: i.e. no non-transactional operation. This is >> obviously a great thing for simplifying my work, but I hope it won't be out >> of line with the expected uses for this stuff. >> >> 2) 6 covering indexes in the forms GSPO, GOPS, SPOG, OSGP, PGSO, OPSG. I >> figure to play to the strength of in-core-memory operation: raw speed, but >> obviously this is going to cost space. >> >> 3) At least for now, all commits succeed. >> >> 4) The use of persistent datastructures to avoid complex and error-prone >> fine-grained locking regimes. I'm using http://pcollections.org/ for now, >> but I am in no way committed to it nor do I claim to have thoroughly vetted >> it. It's simple but enough to get started, and that's all I need to bring >> the real design questions into focus. >> >> 5) Snapshot isolation. Transactions do not see commits that occur during >> their lifetime. Each works entirely from the state of the DatasetGraph at >> the start of its life. >> >> 6) Only as many as one transaction per thread, for now. Transactions are >> not thread-safe. These are simplifying assumptions that could be relaxed >> later. >> >> My current design operates as follows: >> >> At the start of a transaction, a fresh in-transaction reference is taken >> atomically from the AtomicReference that points to the index block. As >> operations are performed in the transaction, that in-transaction reference >> is progressed (in the sense in which any persistent datastructure is >> progressed) while the operations are recorded. Upon an abort, the >> in-transaction reference and the record are just thrown away. Upon a >> commit, the in-transaction reference is thrown away and the operation >> record is re-run against the main reference (the one that is copied at the >> beginning of a transaction). That rerun happens inside an atomic update >> (hence the use of AtomicReference). This all should avoid the need for >> explicit locking in Jena and should confine any blocking against the >> indexes to the actual duration of a commit. >> >> What do you guys think? >> >> >> >> --- >> A. Soroka >> The University of Virginia Library >> >> > > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren
