Re: A proposal for a new locking strategy

Andy Seaborne Sat, 02 Jan 2016 12:20:48 -0800

On 02/01/16 20:13, Paul Houle wrote:

I'd love to see RDF* and SPARQL* support in Jena but that might be too much
to ask.

Submit a patch. Better build an exprimental version of Jena - it isopen source

The << >> syntax is taken from early drafts of SPARQL 1.0. Remnants arestill master grammar file, "#if 0" commented out.

(actually, it's more complicated than that :-) as the form ofone-reification it uses requires some uniqueness of the reification ...mere details)


        Andy


On Sat, Jan 2, 2016 at 3:09 PM, Andy Seaborne <[email protected]> wrote:

On 02/01/16 19:36, Paul Houle wrote:

:s [] [] is a lot like a relational entity,  but I think the really
interesting thing about the RDF model is the ability to create
"post-relational" structures,  even if it does involve blank nodes.  The
future is more like JSON-LD or the nested columnar model.

In that context an entity can be a little bit more than just :s [] [] but
could involve a hierarchical structure or ordered lists.  In the case of
Freebase,  for instance,  you have the "mediator" or "CVT" nodes which
form
a bipartite graph with respect to entity nodes so it is a straightforward
operation to cut out an entity and the CVTs around it.

Lately I've been working on a framework which is a bit like the "boxes and
line" products like Alteryx, KNIME, Actian -- those products are a dime a
dozen but they are all based on a tabular data model and this one is
passing small RDF graphs around,  so it supports the nested columnar
model,
   logic,  etc.  Pipelines like that rapidly become unintuitive and
structurally unstable when joins get involved,  particular when they
involve "parts" of something that is a clear conceptual entity.

Obviously this thing is configured by an RDF graph,  because the point is
not that you draw a data processing pipeline but that one of these data
processing pipelines consumes schema information and a theory library to
build a graph that describes what will be done to the instances.

So there is a MetaFactory that picks apart the graph into subgraphs,
feeds
the subgraphs into the processing modules and then hooks them up in a
communications fabric.

I don't yet have a single strategy for doing the "document extraction" but
I have two or three methods that between them seem to cover the cases that
actually come up.

Following this line,  it would be nice to be able to lock a whole
structure
that looks like

[
     a :Paper ;
    :authors ("Alpher","Bethe","Gamow") ;
    :publication [ :journal :PhysicalReview ; :year 1948 . ]
]

I don't know how implementable such a thing is,  but the problem of
drawing
a line around a complex entity would be part of it.


I have always thought that we need a type of property that expresses
"contains", or is part of an entity description, as well as datatype
properties for relationships between top-level entities.  They are a sort
of generalization of object properties.

Or maybe a richer set of literals to include maps and proper lists. c.f.
Property graphs.

         Andy

On Sat, Jan 2, 2016 at 1:08 PM, Andy Seaborne <[email protected]> wrote:

An SQL database row is a entity in the application data model. If you

model a person, you have one row, but in RDF you have several triples.
Triple level locking is analogous to cell level locking in SQL databases.

          Andy


On 02/01/16 17:01, Paul Houle wrote:

I think it is a worthwhile idea.  Given that you are still having to get

a
global lock to get a triple lock,  isn't there still a scaling limit on
the
global lock?

I think a lot about the things that made the relational database
approach
so successful and certainly one thing is that row-level locking
corresponds
well to real-life access patterns.

On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <[email protected]> wrote:

Currently most Jena implementations use a multiple read one write

solution.  However, I think that it is possible (with minimal work) do
provide a solution that would allow for multiple writers by using lower
level locks.

I take inspiration from the Privileges code.  That code allows
privileges
to be determined down to the triple level.  Basically it does the
following
{noformat}
start
    |
    v
may user perform operation on graph? → (no) (restrict)
    |
    v
(yes)
may user perform operation on any triple in graph → (yes) (allow)
    |
    v
(no)
may user perform operation on the specific triple in graph → (yes)
(allow)
    |
    v
(no) (restrict)
{noformat}

My thought is that the locking may work much the same way.  Once one
thread
has the objects locked the any other thread may not lock the object.
The
process would be something like:

Graph locking would require exclusive lock or non-exclusive lock.  If
the
entire graph were to be locked for writing (as in the current system)
then
the request would be for an exclusive write-lock on the graph.  Once an
exclusive write lock has been established no other write lock may be
applied to the graph or any of its triples by any other thread.

If a thread only wanted to lock part of the graph, for example all
triples
matching <u:foo ANY ANY>, the thread would first acquire a
non-exclusive
write lock on the graph.  It would then acquire an exclusive write lock
on
all triples matching <u:foo ANY ANY>.  Once that triple match lock was
acquired no other thread would be able to lock any triple who's subject
was
u:foo.

The lock request would need to contain the graph name and (in the case
of a
partial graph lock) a set of triple patterns to lock.  The flow for the
lock would be something like:

{noformat}
start
    |
    v
does the thread hold an exclusive graph lock → (yes) (success)
    |
    v
(no)
does the thread want an exclusive graph lock → (yes) (go to ex graph
lock)
    |
    v
(no)
does the thread hold a non-exclusive graph lock → (no) (go to nonex
graph
lock)
    |
    v
(yes) (lbl:lock acquired)
can the thread acquire all the triple locks  → (yes) (success)
    |
    v
(no) (failure)


(lbl: nonex graph lock)
does any thread hold an exclusive graph lock → (yes) (failure)
    |
    v
(no)
acquire non-exclusive graph lock
(goto lock acquired)


(lbl: ex graph lock)
does any thread hold an exclusive graph lock → (yes) (failure)
    |
    v
(no)
does any thread hold a non-exclusive graph lock → (yes) (failure)
    |
    v
(no)
acquire exclusive graph lock
(success)

{noformat}

The permissions system uses an abstract engine to determine if the user
has
access to the triples.  For the locking mechanism the system needs to
track
graph locks and triple patterns locked.  If a new request for a triple
pattern matches any existing (already locked) pattern the lock request
fails.

The simple releaseLock() will release all locks the thread holds.

Note that the locking system does not check the graph being locked to
see
if the items exist in the graph it is simply tracking patterns of locks
and
determining if there are any conflicts between the patterns.

Because this process can duplicate the current locking strategy it can
be
used as a drop in replacement in the current code.  So current code
would
continue to operate as it does currently but future development could
be
more sensitive to locking named graphs, and partial updates to provide
multi-thread updates.

Thoughts?
Claude

--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Reply via email to