On 21/01/16 17:31, A. Soroka wrote:
On Jan 19, 2016, at 8:27 AM, Andy Seaborne <[email protected]>
wrote:
Structure locks are a feature of the implementation like java's
"synchronized". "structure locks" serve a different purpose to
"data locks" (the terminology is invented - there is probably a
proper set of terms). They stop crashes, NPEs and generally
corrupting the datastructures, are datastructure specific but do
not care about consistency of the data and operations on the data
from the application, data model, point of view. What is needed may
be influenced by the upper level locking or it might be that the
structure locks are a set of guaranttees to build on.
Let me use the example (Iterator<Triple> iter = graph.find(S,P,O) ;)
to check my understanding of your thinking.
Let’s say I got the graph from an underlying Dataset (it’s a view)
and I must assume that writers are acting around me. If I want to use
the two systems of locking, I acquire a “data” lock on the pattern
(S,P,O) from the graph, and that action acquires a “data” lock in the
Dataset, that also appears in the locking API of the graph and of any
other views into the Dataset. I also acquire a "structure" lock on
the graph, and that would be implemented in the graph, so it’s not
visible to the Dataset or to other views into that Dataset.
So far, so good ...
The “data” lock is used to manage consistency of my triples from a
application POV irrespective of how they are stored or accessed, and
the “structure” lock prevents someone else from doing something to
the graph object that could clash with what I am doing to throw it
into an inconsistent state (not throw the data into an inconsistent
state from the application POV, but actually mung up the state of the
object itself resulting in an exception or unpredictable wrong
behavior). Is that right?
Yes.
The structure locks have a different lifetime as well. One might be
needed for the iterator to move to the next lump of triples, where
"lump" is due the implementation of the dataset - in TDB the iterator
will be walking only one index and so that part of that index must not
be mutated at the point where the iterator dives back in to read some
more items. Coordinating with the other indexes is left as an exercise
for the reader (hmm - actually it looks amusingly tricky in the general
case).
Background digression:
Many databases nowadays have multi-version data - see MVCC - where both
last committed and "being changed" are around. For RDF that gets
potentially high overhead with many true multiple writers as quads are
small so admin bytes can be a high %-age. Using MVCC is a whole
different design space.
If that is correct, I am then trying to understand what that means
for the semantics of the “structure” locks. It starts to seem like
they must be MRSW in most cases, with exceptions available when the
implementation supports more powerful behavior. It’s almost as though
the current system locks evolve into “structure” locks and the “data”
locks are overlaid to support application-specific notions.
Structure locks are short term - like the commit lock in TxnMem when the
global root is updated.
The case of partition-by-graph is simpler (much simpler?). It's almost
a top level ConcurrentHashMap of graphs and then per graph MRSW.
This affects the index choices; default union graph might have to be a
loop as it is in dataset general but "default union graph" with per
graph updates happening is going to be a friction point anyway.
And remember ConcurrentHashMap is only "mostly concurrent"! Two writes
to the same segment use exclusive locking.
Andy
--- A. Soroka The University of Virginia Library