> On Jan 24, 2016, at 9:24 AM, Andy Seaborne <[email protected]> wrote:
> 
> The structure locks have a different lifetime as well.  One might be needed 
> for the iterator to move to the next lump of triples, where "lump" is due the 
> implementation of the dataset - in TDB the iterator will be walking only one 
> index and so that part of that index must not be mutated at the point where 
> the iterator dives back in to read some more items.  Coordinating with the 
> other indexes is left as an exercise for the reader (hmm - actually it looks 
> amusingly tricky in the general case).

It seems like one effect of this move would be to force some changes to 
handling transactions together with locks. Currently, at least some 
transactional components (e.g. some Datasets) handle locking that is for both 
“structure” and “data” as part of transaction control methods.

> Background digression:
> Many databases nowadays have multi-version data - see MVCC - where both last 
> committed and "being changed" are around.  For RDF that gets potentially high 
> overhead with many true multiple writers as quads are small so admin bytes 
> can be a high %-age.  Using MVCC is a whole different design space.

TxMem begins to step into this area in a small way. This is one reason I would 
like at some point to explore techniques for 
mutate-in-place-within-one-transaction semantics (what Clojure calls 
“transient” data structures, although they intend 
mutate-in-place-within-one-thread). That should save some amount of the 
overhead.

> Structure locks are short term - like the commit lock in TxnMem when the 
> global root is updated.

Indeed, that lock protects against getting the dataset's internal state 
corrupted, but doesn’t guard any data, really.

> The case of partition-by-graph is simpler (much simpler?).  It's almost a top 
> level ConcurrentHashMap of graphs and then per graph MRSW.

I suspect that any partitioning scheme is simpler (although I admit I haven’t 
got a formal argument for that), and that’s one reason I want to investigate 
partition-by-graph. I did try MR+SW per graph to begin with, but I haven’t had 
a chance to reason through how reads and writes against different graphs 
interact. It may be too much for me.

> This affects the index choices; default union graph might have to be a loop 
> as it is in dataset general but "default union graph" with per graph updates 
> happening is going to be a friction point anyway.

Even though all the graphs are in one dataset, if the underlying graphs can be 
locked independently, union graph would be in some ways like a view over 
multiple bases. Most views in Jena that I have used are more often a single 
graph out of a dataset— even with independently lockable graphs, the view would 
be within one lockable region that is the base, if I am making myself clear… 
{grin}

---
A. Soroka
The University of Virginia Library

Reply via email to