Re: A proposal for a new locking strategy

Paul Houle Sat, 02 Jan 2016 12:13:36 -0800

I'd love to see RDF* and SPARQL* support in Jena but that might be too much
to ask.


On Sat, Jan 2, 2016 at 3:09 PM, Andy Seaborne <[email protected]> wrote:

> On 02/01/16 19:36, Paul Houle wrote:
>
>> :s [] [] is a lot like a relational entity,  but I think the really
>> interesting thing about the RDF model is the ability to create
>> "post-relational" structures,  even if it does involve blank nodes.  The
>> future is more like JSON-LD or the nested columnar model.
>>
>> In that context an entity can be a little bit more than just :s [] [] but
>> could involve a hierarchical structure or ordered lists.  In the case of
>> Freebase,  for instance,  you have the "mediator" or "CVT" nodes which
>> form
>> a bipartite graph with respect to entity nodes so it is a straightforward
>> operation to cut out an entity and the CVTs around it.
>>
>> Lately I've been working on a framework which is a bit like the "boxes and
>> line" products like Alteryx, KNIME, Actian -- those products are a dime a
>> dozen but they are all based on a tabular data model and this one is
>> passing small RDF graphs around,  so it supports the nested columnar
>> model,
>>   logic,  etc.  Pipelines like that rapidly become unintuitive and
>> structurally unstable when joins get involved,  particular when they
>> involve "parts" of something that is a clear conceptual entity.
>>
>> Obviously this thing is configured by an RDF graph,  because the point is
>> not that you draw a data processing pipeline but that one of these data
>> processing pipelines consumes schema information and a theory library to
>> build a graph that describes what will be done to the instances.
>>
>> So there is a MetaFactory that picks apart the graph into subgraphs,
>> feeds
>> the subgraphs into the processing modules and then hooks them up in a
>> communications fabric.
>>
>> I don't yet have a single strategy for doing the "document extraction" but
>> I have two or three methods that between them seem to cover the cases that
>> actually come up.
>>
>> Following this line,  it would be nice to be able to lock a whole
>> structure
>> that looks like
>>
>> [
>>     a :Paper ;
>>    :authors ("Alpher","Bethe","Gamow") ;
>>    :publication [ :journal :PhysicalReview ; :year 1948 . ]
>> ]
>>
>> I don't know how implementable such a thing is,  but the problem of
>> drawing
>> a line around a complex entity would be part of it.
>>
>
> I have always thought that we need a type of property that expresses
> "contains", or is part of an entity description, as well as datatype
> properties for relationships between top-level entities.  They are a sort
> of generalization of object properties.
>
> Or maybe a richer set of literals to include maps and proper lists. c.f.
> Property graphs.
>
>         Andy
>
>
>
>> On Sat, Jan 2, 2016 at 1:08 PM, Andy Seaborne <[email protected]> wrote:
>>
>> An SQL database row is a entity in the application data model. If you
>>> model a person, you have one row, but in RDF you have several triples.
>>> Triple level locking is analogous to cell level locking in SQL databases.
>>>
>>>          Andy
>>>
>>>
>>> On 02/01/16 17:01, Paul Houle wrote:
>>>
>>> I think it is a worthwhile idea.  Given that you are still having to get
>>>> a
>>>> global lock to get a triple lock,  isn't there still a scaling limit on
>>>> the
>>>> global lock?
>>>>
>>>> I think a lot about the things that made the relational database
>>>> approach
>>>> so successful and certainly one thing is that row-level locking
>>>> corresponds
>>>> well to real-life access patterns.
>>>>
>>>> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <[email protected]> wrote:
>>>>
>>>> Currently most Jena implementations use a multiple read one write
>>>>
>>>>> solution.  However, I think that it is possible (with minimal work) do
>>>>> provide a solution that would allow for multiple writers by using lower
>>>>> level locks.
>>>>>
>>>>> I take inspiration from the Privileges code.  That code allows
>>>>> privileges
>>>>> to be determined down to the triple level.  Basically it does the
>>>>> following
>>>>> {noformat}
>>>>> start
>>>>>    |
>>>>>    v
>>>>> may user perform operation on graph? → (no) (restrict)
>>>>>    |
>>>>>    v
>>>>> (yes)
>>>>> may user perform operation on any triple in graph → (yes) (allow)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> may user perform operation on the specific triple in graph → (yes)
>>>>> (allow)
>>>>>    |
>>>>>    v
>>>>> (no) (restrict)
>>>>> {noformat}
>>>>>
>>>>> My thought is that the locking may work much the same way.  Once one
>>>>> thread
>>>>> has the objects locked the any other thread may not lock the object.
>>>>> The
>>>>> process would be something like:
>>>>>
>>>>> Graph locking would require exclusive lock or non-exclusive lock.  If
>>>>> the
>>>>> entire graph were to be locked for writing (as in the current system)
>>>>> then
>>>>> the request would be for an exclusive write-lock on the graph.  Once an
>>>>> exclusive write lock has been established no other write lock may be
>>>>> applied to the graph or any of its triples by any other thread.
>>>>>
>>>>> If a thread only wanted to lock part of the graph, for example all
>>>>> triples
>>>>> matching <u:foo ANY ANY>, the thread would first acquire a
>>>>> non-exclusive
>>>>> write lock on the graph.  It would then acquire an exclusive write lock
>>>>> on
>>>>> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
>>>>> acquired no other thread would be able to lock any triple who's subject
>>>>> was
>>>>> u:foo.
>>>>>
>>>>> The lock request would need to contain the graph name and (in the case
>>>>> of a
>>>>> partial graph lock) a set of triple patterns to lock.  The flow for the
>>>>> lock would be something like:
>>>>>
>>>>> {noformat}
>>>>> start
>>>>>    |
>>>>>    v
>>>>> does the thread hold an exclusive graph lock → (yes) (success)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> does the thread want an exclusive graph lock → (yes) (go to ex graph
>>>>> lock)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> does the thread hold a non-exclusive graph lock → (no) (go to nonex
>>>>> graph
>>>>> lock)
>>>>>    |
>>>>>    v
>>>>> (yes) (lbl:lock acquired)
>>>>> can the thread acquire all the triple locks  → (yes) (success)
>>>>>    |
>>>>>    v
>>>>> (no) (failure)
>>>>>
>>>>>
>>>>> (lbl: nonex graph lock)
>>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> acquire non-exclusive graph lock
>>>>> (goto lock acquired)
>>>>>
>>>>>
>>>>> (lbl: ex graph lock)
>>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> does any thread hold a non-exclusive graph lock → (yes) (failure)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> acquire exclusive graph lock
>>>>> (success)
>>>>>
>>>>> {noformat}
>>>>>
>>>>> The permissions system uses an abstract engine to determine if the user
>>>>> has
>>>>> access to the triples.  For the locking mechanism the system needs to
>>>>> track
>>>>> graph locks and triple patterns locked.  If a new request for a triple
>>>>> pattern matches any existing (already locked) pattern the lock request
>>>>> fails.
>>>>>
>>>>> The simple releaseLock() will release all locks the thread holds.
>>>>>
>>>>> Note that the locking system does not check the graph being locked to
>>>>> see
>>>>> if the items exist in the graph it is simply tracking patterns of locks
>>>>> and
>>>>> determining if there are any conflicts between the patterns.
>>>>>
>>>>> Because this process can duplicate the current locking strategy it can
>>>>> be
>>>>> used as a drop in replacement in the current code.  So current code
>>>>> would
>>>>> continue to operate as it does currently but future development could
>>>>> be
>>>>> more sensitive to locking named graphs, and partial updates to provide
>>>>> multi-thread updates.
>>>>>
>>>>> Thoughts?
>>>>> Claude
>>>>>
>>>>> --
>>>>> I like: Like Like - The likeliest place on the web
>>>>> <http://like-like.xenei.com>
>>>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   [email protected]

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: A proposal for a new locking strategy

Reply via email to