Re: A proposal for a new locking strategy

Andy Seaborne Sun, 24 Jan 2016 06:26:01 -0800

On 21/01/16 17:31, A. Soroka wrote:

On Jan 19, 2016, at 8:27 AM, Andy Seaborne <[email protected]>
wrote:


Structure locks are a feature of the implementation like java's
"synchronized". "structure locks" serve a different purpose to
"data locks" (the terminology is invented - there is probably a
proper set of terms). They stop crashes, NPEs and generally
corrupting the datastructures, are datastructure specific but do
not care about consistency of the data and operations on the data
from the application, data model, point of view. What is needed may
be influenced by the upper level locking or it might be that the
structure locks are a set of guaranttees to build on.


Let me use the example (Iterator<Triple> iter = graph.find(S,P,O) ;)
to check my understanding of your thinking.

Let’s say I got the graph from an underlying Dataset (it’s a view)
and I must assume that writers are acting around me. If I want to use
the two systems of locking, I acquire a “data” lock on the pattern
(S,P,O) from the graph, and that action acquires a “data” lock in the
Dataset, that also appears in the locking API of the graph and of any
other views into the Dataset. I also acquire a "structure" lock on
the graph, and that would be implemented in the graph, so it’s not
visible to the Dataset or to other views into that Dataset.


So far, so good ...

The “data” lock is used to manage consistency of my triples from a
application POV irrespective of how they are stored or accessed, and
the “structure” lock prevents someone else from doing something to
the graph object that could clash with what I am doing to throw it
into an inconsistent state (not throw the data into an inconsistent
state from the application POV, but actually mung up the state of the
object itself resulting in an exception or unpredictable wrong
behavior). Is that right?


Yes.

The structure locks have a different lifetime as well. One might beneeded for the iterator to move to the next lump of triples, where"lump" is due the implementation of the dataset - in TDB the iteratorwill be walking only one index and so that part of that index must notbe mutated at the point where the iterator dives back in to read somemore items. Coordinating with the other indexes is left as an exercisefor the reader (hmm - actually it looks amusingly tricky in the generalcase).


Background digression:

Many databases nowadays have multi-version data - see MVCC - where bothlast committed and "being changed" are around. For RDF that getspotentially high overhead with many true multiple writers as quads aresmall so admin bytes can be a high %-age. Using MVCC is a wholedifferent design space.

If that is correct, I am then trying to understand what that means
for the semantics of the “structure” locks. It starts to seem like
they must be MRSW in most cases, with exceptions available when the
implementation supports more powerful behavior. It’s almost as though
the current system locks evolve into “structure” locks and the “data”
locks are overlaid to support application-specific notions.

Structure locks are short term - like the commit lock in TxnMem when theglobal root is updated.

The case of partition-by-graph is simpler (much simpler?). It's almosta top level ConcurrentHashMap of graphs and then per graph MRSW.

This affects the index choices; default union graph might have to be aloop as it is in dataset general but "default union graph" with pergraph updates happening is going to be a friction point anyway.

And remember ConcurrentHashMap is only "mostly concurrent"! Two writesto the same segment use exclusive locking.


    Andy


--- A. Soroka The University of Virginia Library

Re: A proposal for a new locking strategy

Reply via email to