Re: A proposal for a new locking strategy

Paul Houle Sat, 02 Jan 2016 11:36:43 -0800

:s [] [] is a lot like a relational entity,  but I think the really
interesting thing about the RDF model is the ability to create
"post-relational" structures,  even if it does involve blank nodes.  The
future is more like JSON-LD or the nested columnar model.


In that context an entity can be a little bit more than just :s [] [] but
could involve a hierarchical structure or ordered lists.  In the case of
Freebase,  for instance,  you have the "mediator" or "CVT" nodes which form
a bipartite graph with respect to entity nodes so it is a straightforward
operation to cut out an entity and the CVTs around it.

Lately I've been working on a framework which is a bit like the "boxes and
line" products like Alteryx, KNIME, Actian -- those products are a dime a
dozen but they are all based on a tabular data model and this one is
passing small RDF graphs around,  so it supports the nested columnar model,
 logic,  etc.  Pipelines like that rapidly become unintuitive and
structurally unstable when joins get involved,  particular when they
involve "parts" of something that is a clear conceptual entity.

Obviously this thing is configured by an RDF graph,  because the point is
not that you draw a data processing pipeline but that one of these data
processing pipelines consumes schema information and a theory library to
build a graph that describes what will be done to the instances.

So there is a MetaFactory that picks apart the graph into subgraphs,  feeds
the subgraphs into the processing modules and then hooks them up in a
communications fabric.

I don't yet have a single strategy for doing the "document extraction" but
I have two or three methods that between them seem to cover the cases that
actually come up.

Following this line,  it would be nice to be able to lock a whole structure
that looks like

[
   a :Paper ;
  :authors ("Alpher","Bethe","Gamow") ;
  :publication [ :journal :PhysicalReview ; :year 1948 . ]
]

I don't know how implementable such a thing is,  but the problem of drawing
a line around a complex entity would be part of it.

On Sat, Jan 2, 2016 at 1:08 PM, Andy Seaborne <[email protected]> wrote:

> An SQL database row is a entity in the application data model. If you
> model a person, you have one row, but in RDF you have several triples.
> Triple level locking is analogous to cell level locking in SQL databases.
>
>         Andy
>
>
> On 02/01/16 17:01, Paul Houle wrote:
>
>> I think it is a worthwhile idea.  Given that you are still having to get a
>> global lock to get a triple lock,  isn't there still a scaling limit on
>> the
>> global lock?
>>
>> I think a lot about the things that made the relational database approach
>> so successful and certainly one thing is that row-level locking
>> corresponds
>> well to real-life access patterns.
>>
>> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <[email protected]> wrote:
>>
>> Currently most Jena implementations use a multiple read one write
>>> solution.  However, I think that it is possible (with minimal work) do
>>> provide a solution that would allow for multiple writers by using lower
>>> level locks.
>>>
>>> I take inspiration from the Privileges code.  That code allows privileges
>>> to be determined down to the triple level.  Basically it does the
>>> following
>>> {noformat}
>>> start
>>>   |
>>>   v
>>> may user perform operation on graph? → (no) (restrict)
>>>   |
>>>   v
>>> (yes)
>>> may user perform operation on any triple in graph → (yes) (allow)
>>>   |
>>>   v
>>> (no)
>>> may user perform operation on the specific triple in graph → (yes)
>>> (allow)
>>>   |
>>>   v
>>> (no) (restrict)
>>> {noformat}
>>>
>>> My thought is that the locking may work much the same way.  Once one
>>> thread
>>> has the objects locked the any other thread may not lock the object.  The
>>> process would be something like:
>>>
>>> Graph locking would require exclusive lock or non-exclusive lock.  If the
>>> entire graph were to be locked for writing (as in the current system)
>>> then
>>> the request would be for an exclusive write-lock on the graph.  Once an
>>> exclusive write lock has been established no other write lock may be
>>> applied to the graph or any of its triples by any other thread.
>>>
>>> If a thread only wanted to lock part of the graph, for example all
>>> triples
>>> matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
>>> write lock on the graph.  It would then acquire an exclusive write lock
>>> on
>>> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
>>> acquired no other thread would be able to lock any triple who's subject
>>> was
>>> u:foo.
>>>
>>> The lock request would need to contain the graph name and (in the case
>>> of a
>>> partial graph lock) a set of triple patterns to lock.  The flow for the
>>> lock would be something like:
>>>
>>> {noformat}
>>> start
>>>   |
>>>   v
>>> does the thread hold an exclusive graph lock → (yes) (success)
>>>   |
>>>   v
>>> (no)
>>> does the thread want an exclusive graph lock → (yes) (go to ex graph
>>> lock)
>>>   |
>>>   v
>>> (no)
>>> does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
>>> lock)
>>>   |
>>>   v
>>> (yes) (lbl:lock acquired)
>>> can the thread acquire all the triple locks  → (yes) (success)
>>>   |
>>>   v
>>> (no) (failure)
>>>
>>>
>>> (lbl: nonex graph lock)
>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>   |
>>>   v
>>> (no)
>>> acquire non-exclusive graph lock
>>> (goto lock acquired)
>>>
>>>
>>> (lbl: ex graph lock)
>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>   |
>>>   v
>>> (no)
>>> does any thread hold a non-exclusive graph lock → (yes) (failure)
>>>   |
>>>   v
>>> (no)
>>> acquire exclusive graph lock
>>> (success)
>>>
>>> {noformat}
>>>
>>> The permissions system uses an abstract engine to determine if the user
>>> has
>>> access to the triples.  For the locking mechanism the system needs to
>>> track
>>> graph locks and triple patterns locked.  If a new request for a triple
>>> pattern matches any existing (already locked) pattern the lock request
>>> fails.
>>>
>>> The simple releaseLock() will release all locks the thread holds.
>>>
>>> Note that the locking system does not check the graph being locked to see
>>> if the items exist in the graph it is simply tracking patterns of locks
>>> and
>>> determining if there are any conflicts between the patterns.
>>>
>>> Because this process can duplicate the current locking strategy it can be
>>> used as a drop in replacement in the current code.  So current code would
>>> continue to operate as it does currently but future development could be
>>> more sensitive to locking named graphs, and partial updates to provide
>>> multi-thread updates.
>>>
>>> Thoughts?
>>> Claude
>>>
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>
>>>
>>
>>
>>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   [email protected]

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: A proposal for a new locking strategy

Reply via email to