Re: Graph SPI Contract

Claude Warren Sat, 23 Aug 2014 10:07:55 -0700

Andy,

 I think we agree on transactions.


I think the difference is in the understanding of when listeners are
triggered.

I realize that all current implementations of listeners appear to be on a
single thread.  But I did not realize that was a requirement of the
listener interface.  (yes that would be part of the listener contract
test).  I would think that a listener could place messages on a queue and
that would be sufficient to meet the listener interface -- but that would
mean the state within a transaction would be visible outside of the
transaction.

I have wanted the ability for one thread to be notified when another thread
completed a transaction -- basically when the changes became visible.  But
for now that appears to be outside the scope of a listener.

As listeners are same thread callbacks, does this mean that when a
transaction is rolled back the listeners must be notified to undo the
previous notifications -- for example

begin Tx
add T1
listener notified of add T1
rollback Tx
listener notified of delete T1

If not then we need to document this case.

To be honest it is this issue that made me think that listeners should be
notified after the transaction committed.  If listeners are notified after
commit then they can also be on different threads.

Claude





On Sat, Aug 23, 2014 at 4:22 PM, Andy Seaborne <a...@apache.org> wrote:

> Claude,
>
> We seem to have different understandings about transactions.
>
> I see a transaction (as in ACID) as defining a scope or view of the system
> (or valid state of the system). Within a transaction changes happen only
> from actions of the transaction, not outside.
>
> A transaction sees a consistent state of the world - transactions are
> serialized onto the time line and appear to happen instantaneously as a
> single unit.  From outside, nothing changes until all of a sudden all
> changes are made at once.  These are "serializable" isolation which is the
> ideal.
>
> Weaker forms of isolation exist but they have unpredictable effects. For
> us, a find() is a range query so even isolation level "repeatable reads"
> can cause a find to see a state of the storage that never existed in any
> application view point.
>
> http://en.wikipedia.org/wiki/Isolation_%28database_systems%29
>
> We aren't considering nested transactions.
>
> add() just needs to define what add() does inside a transaction.  At some
> later time, all the change become visible.
>
> == add()
>
> Pre condition:
>   graph exists
>
> action:
>   add(t)
>
> Post condition:
> in the view of the transaction for the current thread:
>   if no exception
>     graph contains t
>   else if AddDeniedException
>     graph contains t if and only if the graph contained t before.
>
> Listeners are same-thread callbacks so they are in the same transaction as
> the update.  Complex systems on top of this are out of scope. Jena provides
> the building block.
>
>
> On 17/08/14 11:43, Claude Warren wrote:
>
>> I think the contract has to cover multi-threaded possibilities.  However,
>> for the most part the document I originally proposed is the view from
>> within a single thread.
>>
>
>
> For non-transactional, multi-threaded systems, I don't think anything
> needs to said except "don't!!" - or rather "single view" or else all bets
> are off.  Failure modes are way too implementation specific - even across
> JVMs (see IBM vs oracle JVMs for HashMap as we have know here).
>
> Jena in-memory is read-concurrent safe.
> http://jena.apache.org/documentation/notes/concurrency-howto.html
>
> Even that is non-trivial to provide in the inference engine.
>
>
>  I agree that graphAdd serves no purpose and go as far as saying it should
>> be removed in Jena 3.
>>
>
> Yes.
>
>  Think that defining the add with the listener will clarify the contract,
>> but we need clarification of the Listener contract later.
>>
>> I think that the current process is:
>>
>>     1. triple added to or deleted from graph
>>     2. listeners notified
>>
>>
>> I think that this is correct but that we need to add that exceptions in
>> the
>> listeners may not raise and add denied exception.
>>
>
> s/and/an/ ?  If so - yes.
>
> How about:
>
> 1. listeners should not raise exceptions.
> 2. If they do (outside the contract), the exception should be (logged and)
> dropped.
>
> It seems odd to me to have an exception and the triple be added.
>
>   I believe that the
>> contract with listeners is:
>>
>>     1. they are notified after the event they are listening for has been
>>
>>     completed.  That they are not notified if an Exception is thrown in
>> the add.
>>     2. if a listener throws an exception it will not undo the add or
>> delete.
>>
>
> Yes.
>
>      3. I believe that: #1 means that the listeners would be notified at
>> the
>>
>>     commit of a transaction, so listeners are guaranteed to have messages
>>     queued by the end of the commit (if present) or at the end of add (if
>> no
>>     transaction is present).
>>
>
> A basic listener is inside the transaction where the add() is happening..
> They are on the same thread anyway.  I don't know how to implement
> same-thread, different visibility.
>
> I wonder if listeners can be described with a separate contact - makes the
> contract tests modular.
>
> e.g.
> C1/ Contract for add/delete/find/and others as actions of set of triples.
>
> C2/ Contract for listeners where
>
> add-with-listener => core add contract + listener called.
>
>  This does lead to the possibility that a graph implementation may need to
>> notify other components within the transaction that the add or delete was
>> completed -- I am not certain that this is needed but raise the point here
>> for further discussion if necessary.
>>
>> So the full process for an add is
>>
>>     1. begin add( triple )
>>     2. if adding is not allowed (Capabilities.addAllowed() returns false)
>>     throw AddDeniedException.
>>     3. add to the underlying storage system, may throw an exception.
>>        1. If a checked exception is thrown wrap it in an
>> AddDeniedException.
>>
>
> Any other kind of exception is presumably a system error and leaves the
> system in unknown state.
>
>         4. if not in a transaction notify listeners of add
>>     5. end add(triple)
>>
>
> "end add" means return to caller?
>
> So far, so good.
>
>      6. begin commit if in transaction
>>     7. commit the change so that it is visible to outside of the
>> transaction.
>>     8. notify listeners of add.
>>     9. end commit.
>>
>
> I don't understand this. Are you trying for JDBC autocommit effects?
>
> See overall comments on transactions.
>
> Illustration:
> W1, R1 R2 R3 -- transactions.
>
> Thread 1                         Thread 2
> begin W1
> add t1
> add t2
>                                  begin R1 -find-end R1 (sees no triples)
> add t3
> find (sees 3 triples)
> add t4
>                                  begin R2-find-end R2 (sees no triples)
> delete t2
> commit W1
>                                  begin R3-find-end R3 sees t1 t3 t4
>
> At no point is t2 visible outside thread 1.
>
> At no point are exactly triples t1 and t3 but not t4 visible outside
> thread 1.
>
> Strictly, R3 is either see t1, t3, t4 or see no triples.  There is no
> guarantee on the exact time point.  A detail of transactions.
>
> Autocommit where an implicit begin-commit goes round any add call that is
> not made from a thread in a transaction is a possibility.
>
> i.e.
> operation X
>
> if not in a transaction
>   =>
> begin
>  operation X
> commit
>
> BUT this is very, very expensive when it's apserisstent storage to get D
> durability.
>
> To get D you need a disk write so ~5-10ms of rotational disk (disk seek
> time), 0.1ms if an SSD but it is also a system call (virtual memory costs)
> and still has to contend for the SSD controller.  Adding a commit on every
> triple add reduces the maximum update rate to 10K triples per second in
> ideal circumstances without any OS costs.  Taht's dire. Batching wins!
> ]
>
> c.f. JDBC where it is usually default "on" (safety) and leads to other
> issues of dire performance at this granularity.
>
>  If that it the case then the full process for a delete is
>>
>>     1. begin delete( triple )
>>     2. if deleting is not allowed (Capabilities.deleteAllowed() returns
>>     false) throw DeleteDeniedException.
>>     3. delete from the underlying storage system, may throw an exception.
>>        1. If a checked exception is thrown wrap it in a
>>        DeleteDeniedException.
>>        4. if not in a transaction notify listeners of delete
>>     5. end delete(triple)
>>     6. begin commit if in transaction
>>     7. commit the change so that it is visible to outside of the
>> transaction.
>>     8. notify listeners of delete.
>>     9. end commit.
>>
>>
>> As for the find process
>>
>>     1. returns an ExtendedIterator of triples that match the specified
>>     triple.
>>     2. If inside a transaction all uncommited triples are candidates for
>>
>>     matching.
>>
>> The iterator may throw a ConcurrentModificationException in conditions
>> outlined by
>> http://docs.oracle.com/javase/7/docs/api/java/util/
>> ConcurrentModificationException.html
>> with the following caveat:
>>
>>     - If the find is taking place within a transaction and the current
>>
>>     thread has not modified the underlying data the
>>     ConcurrentModificationException may not be thrown.
>>
>
> We can treat ConcurrentModificationException as an independent concept
> from transactions.
>
>         Andy
>
>
>
>>
>> Thoughs?
>> Claude
>>
>>
>>
>>
>>
>> On Mon, Aug 11, 2014 at 6:19 PM, Andy Seaborne <a...@apache.org> wrote:
>>
>>  On 08/08/14 22:13, Claude Warren wrote:
>>>
>>>  This is a message stack for Graph SPI Contract testing.  It covers only
>>>> the
>>>> Jena 2 Graph Contract.  This an attempt to document the current Graph
>>>> contract.  Any correction should specify the bullet point number.
>>>>
>>>>
>>> Overall:
>>>
>>> Getting the exact contract is hard and I'm assuming this is only for
>>> single-threaded code.
>>>
>>> Maybe start with a subset of Graph
>>>
>>> .add
>>> .delete
>>> .find
>>>
>>> then add listeners into the picture
>>> then define other operations in terms of the primitives:
>>>
>>> .contains
>>> .remove
>>> .clear
>>>
>>> Transactions:
>>>
>>> The text around transactions does not distinguish being inside or outside
>>> a transaction.
>>>
>>> There are 2 base kinds of graphs - ones in datasets (views) and
>>> standalone
>>> ones, then things like InfGraph and other added functionality.
>>> Transactions
>>> on view graphs need to be defined in the context of the dataset because
>>> transactions are connected.
>>>
>>>
>>>       1. add() -- technically from GraphAdd
>>>
>>>>
>>>>
>>> IMO The "GraphAdd" interface serves no purpose.
>>>
>>>          1. when a triple is added to a graph all registered listeners
>>> must
>>>
>>>>
>>>>         receive an (add graph triple) message
>>>>
>>>>
>>> It's hard to define listeners:
>>>
>>>    Does a listener see the graph before or after the triple is added?
>>>    Is a listener called if AddDeniedException is raised?
>>>    Can a listener cause AddDeniedException to be raised?
>>>    Is the listener guaranted to have been called by the
>>>      time add() returns?
>>>
>>> hence the suggestion of starting with just the basic operations.
>>>
>>>          2. subsequent graph.contains( triple ) must return true.
>>>
>>>>         3. If add is performed within a transaction the listeners are
>>>> not
>>>>
>>>>         notified until after the commit.
>>>>         4. If graph is read only (Capabilities.addAllowed() returns
>>>> false)
>>>>         must throw AddDeniedException
>>>>
>>>>
>>> 1.1 and 1.2 have "must" text
>>>
>>> Surely it's:
>>>
>>> Either
>>>     the triple is added
>>> or
>>>     an AddDeniedException exception is thrown.
>>>
>>>       2. clear()
>>>
>>>>
>>>>
>>> This is like remove(Node.ANY, Node.ANY, Node.ANY) except for the listener
>>> contract?
>>>
>>>          1. If the graph can be empty (Capabilities.canBeEmpty()) there
>>>
>>>> should
>>>>
>>>>         be no triples returned from find( Triple.ANY )
>>>>
>>>>
>>> Nothing except tests uses Capabilities.canBeEmpty.
>>>
>>>          2. If the graph can not be empty there should only be the
>>> elements
>>>
>>>>
>>>>         that were present when the graph was created.
>>>>
>>>>
>>> This implies part of the contract for create in that create does not take
>>> initial contents.
>>>
>>> Graph g2 = view of g1
>>> g1 can not be empty
>>>
>>>          3. if delete is not allowed (Capabilities.canDelete() is
>>>
>>>>
>>>>         false) clear() must throw DeleteDeniedException
>>>>
>>>>
>>> An alternative is that if clear() causes a change, DeleteDeniedException
>>> is raised.
>>>
>>> Example - if the empty, read-only graph is cleared, why should
>>> DeleteDeniedException be raised?
>>>
>>> There is a relationship to remove(ANY,ANY,ANY)
>>>
>>>       3. close()
>>>
>>>>         1. after close isClosed() should return true
>>>>         2. calling close on closed graph should not throw an exception.
>>>>         3. calling any Graph method other than close() on a closed graph
>>>>         should throw a ClosedException
>>>>
>>>>
>>> Is there a need for close() long term, if not, then the deatiled contract
>>> is moot.
>>>
>>> This form of Graph.close() might work for a basic, storage graph but
>>> there
>>> are other cases.
>>>
>>> A graph may be a view of another - close is meaningless and is more
>>> usefully a no-op.
>>>
>>> If the graph is from a system wide cache, close() might be a no-op so as
>>> to protect the cache.
>>>
>>>       4. contains()
>>>
>>>>
>>>>
>>> Defined as "find(S,P,O).hasNext()"
>>>
>>>          1. returns true if the graph contains the specified triple.
>>>
>>>>            1. Node.ANY will match any node in the position.
>>>>         2. if the graph supports transactions and a transaction is in
>>>>
>>>>         progress the graph will only not show any triples that only
>>>> exist
>>>> within
>>>>         the transaction.
>>>>
>>>>
>>> If an app goes:
>>>
>>>    begin
>>>    add(triple)
>>>    contains(triple) -> false
>>>
>>> it's going to be a bit confusing!
>>>
>>>       5. delete()
>>>
>>>>         1. if delete is not allowed (Capabilities.canDelete() is false)
>>>>         delete() must throw DeleteDeniedException
>>>>         2. when a triple is deleted from  a graph all registered
>>>> listeners
>>>>
>>>>         must receive an (delete graph triple) message
>>>>         3. subsequent graph.contains( triple ) must return false.
>>>>         4. If add is performed within a transaction the listeners are
>>>> not
>>>>
>>>>         notified until after the commit.
>>>>
>>>>
>>> Same listener issues as add()
>>>
>>>       6. dependsOn()
>>>
>>>>
>>>>
>>> What is this used for nowadays?
>>>
>>>          1. true if this graph's content depends on the other graph. May
>>> be
>>>
>>>>
>>>>         pessimistic (ie return true if it's not sure). Typically true
>>>> when a  graph
>>>>         is a composition of other graphs, eg union.
>>>>      7. find()
>>>>         1. returns an iterator of triples that match the specified
>>>> triple.
>>>>
>>>>
>>> And the iterator?
>>>
>>> Specifically, there are ConcurrentModificationException issues even in
>>> single threaded code.
>>>
>>>       8. getBulkUpdateHandler() -- deprecated / removed -- no tests
>>>
>>>>      9. getCapabilities()
>>>>
>>>>
>>> Aside: Capabilities need clearing up.  It's too black-and-white. it can't
>>> express the totality of possibilities.
>>>
>>> Big question: what use does application code make of capabilities?  I
>>> suspect none, or noe except to flag errors.  I can't envisage getting a
>>> graph that says"addAllowed=false" and doign anything but signalling the
>>> user that they can't do what ever the task is.   Yet it's going to have
>>> ("should have") error handling code anyway.
>>>
>>> Maybe it reduces to
>>>
>>>     Graph.isReadOnly
>>>
>>> I'm unconvinced the add/delete distinction matters.  I can think of graph
>>> where there is a difference (append-only) but not of an application that
>>> adapts based on this other than to say "no, can't".
>>>
>>> e.g.
>>> addAllowed( boolean everyTriple );
>>>
>>> Capabilities.handlesLiteralTyping -- can't say "some, not others"
>>>
>>>          1. must not return null.
>>>
>>>>
>>>>
>>> If we retain the current Capabilities, then we need a way to say "don't
>>> know".  Some of the capabilities are definite yes/no.
>>>
>>> e.g addAllowed -- presumably "yes" on most graphs but what if there is a
>>> security wrapper?  Or system resources are
>>>
>>>          2. capabilities must match other results.
>>>
>>>>            1. if not addAllowed() , add must throw exception
>>>>            2. if not deleteAllowed(),
>>>>               1. delete must throw exception
>>>>               2. clear must throw exception
>>>>
>>>>
>>> clear() of an already empty graph?
>>>
>>>             3. if iteratorRemoveAllowed(), iterator from find must allow
>>>
>>>>            remove()
>>>>            4. if canBeEmpty()
>>>>               1. initial construction must be empty()
>>>>               2. clear() must be empty.
>>>>            3. must pass Capabilities contract tests.
>>>>      10. getEventManager()
>>>>         1. May not return null
>>>>         2. Listeners registered with event manager must be notified of
>>>>         changes.
>>>>         3. EventManager must pass GraphEventManager contract test.
>>>>      11. getPrefixMapping()
>>>>         1. May not be null
>>>>         2. changes to the prefixes managed by the PrefixMapping returned
>>>>
>>>>          getPrefixMapping() must be reflected in all other PrefixMapping
>>>> classes
>>>>         from the same graph.
>>>>
>>>>
>>> I disagree with the defined contract in javadoc! The "same object" is
>>> horrible!!
>>>
>>>          3. Changes made to a prefix mapping within a transaction are
>>>
>>>> visible
>>>>
>>>>         outside of the transaction and are not rolled back by the
>>>> transaction.
>>>>
>>>>
>>> !!
>>>
>>>          4. PrefixMapping  must pass the PrefixMapping contract test
>>>
>>>>      12. getStatisticsHandler()
>>>>
>>>>
>>> No longer used.
>>>
>>>          1. may be null
>>>
>>>>         2. if not null must pass the GraphStatisticsHandler contract
>>>> test.
>>>>         3. all GraphStatisticsHandlers returned must pass
>>>> handler.equals(
>>>>         handler2 )
>>>>      13. getTransactionHandler()
>>>>         1. may not be null
>>>>         2. must pass the TransactionHandler contract test.
>>>>      14. isClosed()
>>>>         1. must return false when the graph is created.
>>>>         2. must return true after the close() has been called.
>>>>      15. isEmpty()
>>>>         1. must return true when graph is created if
>>>>         Capabilities.canBeEmpty() is true
>>>>
>>>>
>>> I don't understand this - a graph may be a view of another soit's not
>>> empty at the start.
>>>
>>>          2. must not return true after triples are added
>>>
>>>>         3. must return true after all triples are deleted if
>>>>         Capabilities.canBeEmpty() is true.
>>>>         4. must return true after clear() if Capabilities.canBeEmpty()
>>>> is
>>>>         true.
>>>>      16. isIsomorphicWith() -- from (
>>>>
>>>>      http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#
>>>> section-graph-equality):
>>>>       Two RDF graphs G and G' are isomorphic (that is, they have an
>>>> identical
>>>>      form) if there is a bijection M between the sets of nodes of the
>>>> two
>>>>      graphs, such that:
>>>>         1. M maps blank nodes to blank nodes.
>>>>         2. M(lit)=lit for all RDF literals lit which are nodes of G.
>>>>         3. M(iri)=iri for all IRIs iri which are nodes of G.
>>>>         4. The triple ( s, p, o ) is in G if and only if the triple (
>>>> M(s),
>>>>
>>>>         p, M(o) ) is in G'
>>>>      17. remove()
>>>>         1. when a triple is removed from a graph all registered
>>>> listeners
>>>>
>>>>         must receive an (remove graph triple) message
>>>>
>>>>
>>> remove() removes by pattern
>>>
>>> After remove(S,P,O), contains(S,P,O) is false (S/P/O can be Node.ANY)
>>>
>>>          2. subsequent graph.contains( triple ) must return false, unless
>>>
>>>> the
>>>>
>>>>         triple was is in the newly constructed  graph and
>>>> Capabilities.canBeEmpty()
>>>>         is false.
>>>>         3. If removed is performed within a transaction the listeners
>>>> are
>>>> not
>>>>
>>>>         notified until after the commit.
>>>>         4. If delete is denied (Capabilities.deleteAllowed() returns
>>>> false)
>>>>         must throw DeleteDeniedException
>>>>      18. size()
>>>>         1. if Capabilities.sizeAccurate() is true
>>>>            1. if transactions are supported
>>>>            (TransactionHandler.transactionsSupported() is true)
>>>>               1. the size from within the transaction must function
>>>>                  1. adding a triple must increment the size of the
>>>> graph.
>>>>                  2. removing a triple must decrement the size of the
>>>> graph.
>>>>               2. the size from outside the transaction must not change
>>>>            2. if transactions are not in
>>>>            supported  (TransactionHandler.transactionsSupported() is
>>>> false)
>>>>               1.  adding a triple must increment the size of the graph.
>>>>               2. removing a triple must decrement the size of the graph.
>>>>            2. if Capabilities.sizeAccurate() is false
>>>>            1. if transactions are supported
>>>>            (TransactionHandler.transactionsSupported() is true)
>>>>               1. the size from within the transaction must function
>>>>                  1. adding a triple may increment the size of the graph.
>>>>                  2. adding a triple may not decrement the size of the
>>>> graph.
>>>>                  3. removing a triple may decrement the size of the
>>>> graph.
>>>>                  4. removing a triple may not increment the size of the
>>>> graph.
>>>>               2. the size from outside the transaction must not change
>>>>                  1. adding a triple may not decrement the size of the
>>>> graph.
>>>>                  2. removing a triple may not increment the size of the
>>>> graph.
>>>>                  2. if transactions are not in
>>>>            supported  (TransactionHandler.transactionsSupported() is
>>>> false)
>>>>               1. adding a triple may increment the size of the graph.
>>>>               2. adding a triple may not decrement the size of the
>>>> graph.
>>>>               3. removing a triple may decrement the size of the graph.
>>>>               4. removing a triple may not increment the size of the
>>>> graph.
>>>>
>>>>
>>>>
>>>> Please comment as appropriate.
>>>> Claude
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph SPI Contract

Reply via email to