On the specifics of the scheme and locking things in general.
There is a nice picture in in:
http://sqlperformance.com/2014/04/t-sql-queries/the-read-committed-isolation-level
where the scan returns a view of the data that never actually exists. It
misses one row and sees another twice.
Consistency matters. As a locking scheme this seems to be closely
related to 2PL:
https://en.wikipedia.org/wiki/Two-phase_locking
The Triple level locking is sort of like read-committed. All the
literature on read-committed and other inconsistency applies.
https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Non-repeatable_reads
One thing I'm unclear about is what to lock: does the application say
what to lock or does the system lock a needed?
An example: all triples with a subject in one namespace:
{ ?s <p> ?o .
FILTER regex (str(?s), "^http://example/namespace#")
}
to find smallest set of triples to lock is executing the pattern itself.
Negation makes things even more interesting.
Some operations combine information from different parts of the graph so
taking locks out as execution progresses leads to problems of
consistency for aggregations like COUNT or SUM.
Locking on the whole data avoids this.
On 02/01/16 14:18, Claude Warren wrote:
Currently most Jena implementations use a multiple read one write
solution.
Transactions don't use those locks.
In transactions, locking isn't even touched There is a cheap imperfect
Transactional based on locks but it can't do abort. Locking on it's own
isn't transactions.
Transactional systems like TxnMem and TDB do a form of whole-database
isolation and give serializable (technical sense) semantics for SPARQL.
However, I think that it is possible (with minimal work) do
provide a solution that would allow for multiple writers by using lower
level locks.
As I understand it, the application is now involved to describe the part
of the graph to lock. Is that right?
I take inspiration from the Privileges code. That code allows privileges
to be determined down to the triple level. Basically it does the following
{noformat}
start
|
v
may user perform operation on graph? → (no) (restrict)
|
v
(yes)
may user perform operation on any triple in graph → (yes) (allow)
|
v
(no)
may user perform operation on the specific triple in graph → (yes) (allow)
|
v
(no) (restrict)
{noformat}
My thought is that the locking may work much the same way. Once one thread
has the objects locked the any other thread may not lock the object. The
process would be something like:
To check here - locks are held for a critical section (as current
design)? Not a single API call? Something else?
The permissions system is per API call but then permissions are not
changing like data is on a live system.
Graph locking would require exclusive lock or non-exclusive lock. If the
entire graph were to be locked for writing (as in the current system) then
the request would be for an exclusive write-lock on the graph. Once an
exclusive write lock has been established no other write lock may be
applied to the graph or any of its triples by any other thread.
If a thread only wanted to lock part of the graph, for example all triples
matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
write lock on the graph. It would then acquire an exclusive write lock on
all triples matching <u:foo ANY ANY>. Once that triple match lock was
acquired no other thread would be able to lock any triple who's subject was
u:foo.
The lock request would need to contain the graph name and (in the case of a
partial graph lock) a set of triple patterns to lock. The flow for the
lock would be something like:
{noformat}
start
|
v
does the thread hold an exclusive graph lock → (yes) (success)
|
v
(no)
does the thread want an exclusive graph lock → (yes) (go to ex graph lock)
|
v
(no)
does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
lock)
|
v
(yes) (lbl:lock acquired)
can the thread acquire all the triple locks → (yes) (success)
|
v
(no) (failure)
Does failure mean an exception or waiting?
If it's an exception, what happens from the app point of view?
(lbl: nonex graph lock)
does any thread hold an exclusive graph lock → (yes) (failure)
|
v
(no)
acquire non-exclusive graph lock
(goto lock acquired)
(lbl: ex graph lock)
does any thread hold an exclusive graph lock → (yes) (failure)
|
v
(no)
does any thread hold a non-exclusive graph lock → (yes) (failure)
|
v
(no)
acquire exclusive graph lock
(success)
These tests themselves have to be done by some synchronized code becaue
there are multiple tests while other threads may be changing things at
the same time.
{noformat}
If waiting, rather than aborting, then deadlock is possible. What's the
plan for that?
(exclusive locks on triples)
Thread 1:
Lock triple A
... some app code ... if() involved ...
... decides it needs to ...
Lock triple B
whereas
Thread 2:
Lock triple B
... some app code ... if() involved ...
... decides it needs to ...
Lock triple A
and that happens in wall-clock time:
Thread 1: Thread 2:
Lock triple A
Lock triple B
Lock triple B
Lock triple A
Thread 1 and 2 now wait forever.
How far back are you going to wind back to? In a transaction system
either the transaction is aborted (so all the way back); some system
retry (i.e. rerun application code).
The partial solution is to require lock ordering - locks must be
acquired in the same order and same class - and for a sequence of
operations, not just a single access.
The application is now involved - this is a specialized programming
model for the ability to have multiple writers.
The permissions system uses an abstract engine to determine if the user has
access to the triples. For the locking mechanism the system needs to track
graph locks and triple patterns locked. If a new request for a triple
pattern matches any existing (already locked) pattern the lock request
fails.
The simple releaseLock() will release all locks the thread holds.
Note that the locking system does not check the graph being locked to see
if the items exist in the graph it is simply tracking patterns of locks and
determining if there are any conflicts between the patterns.
Because this process can duplicate the current locking strategy it can be
used as a drop in replacement in the current code. So current code would
continue to operate as it does currently but future development could be
more sensitive to locking named graphs, and partial updates to provide
multi-thread updates.
Thoughts?
It would be a good thing to have as an additional choice of graph
(ideally dataset) implementation. A multi-writer dataset, then just
make graph views of the dataset, would great to have.
Is it easy to build an experimental version using the permissions code?
Claude
Andy