Re: Graph SPI Contract

Andy Seaborne Sat, 23 Aug 2014 08:23:16 -0700

Claude,

We seem to have different understandings about transactions.

I see a transaction (as in ACID) as defining a scope or view of thesystem (or valid state of the system). Within a transaction changeshappen only from actions of the transaction, not outside.

A transaction sees a consistent state of the world - transactions areserialized onto the time line and appear to happen instantaneously as asingle unit. From outside, nothing changes until all of a sudden allchanges are made at once. These are "serializable" isolation which isthe ideal.

Weaker forms of isolation exist but they have unpredictable effects.For us, a find() is a range query so even isolation level "repeatablereads" can cause a find to see a state of the storage that never existedin any application view point.


http://en.wikipedia.org/wiki/Isolation_%28database_systems%29

We aren't considering nested transactions.

add() just needs to define what add() does inside a transaction. Atsome later time, all the change become visible.


== add()

Pre condition:
  graph exists

action:
  add(t)

Post condition:
in the view of the transaction for the current thread:
  if no exception
    graph contains t
  else if AddDeniedException
    graph contains t if and only if the graph contained t before.

Listeners are same-thread callbacks so they are in the same transactionas the update. Complex systems on top of this are out of scope. Jenaprovides the building block.


On 17/08/14 11:43, Claude Warren wrote:

I think the contract has to cover multi-threaded possibilities.  However,
for the most part the document I originally proposed is the view from
within a single thread.

For non-transactional, multi-threaded systems, I don't think anythingneeds to said except "don't!!" - or rather "single view" or else allbets are off. Failure modes are way too implementation specific - evenacross JVMs (see IBM vs oracle JVMs for HashMap as we have know here).


Jena in-memory is read-concurrent safe.
http://jena.apache.org/documentation/notes/concurrency-howto.html

Even that is non-trivial to provide in the inference engine.

I agree that graphAdd serves no purpose and go as far as saying it should
be removed in Jena 3.


Yes.

Think that defining the add with the listener will clarify the contract,
but we need clarification of the Listener contract later.

I think that the current process is:

    1. triple added to or deleted from graph
    2. listeners notified

I think that this is correct but that we need to add that exceptions in the
listeners may not raise and add denied exception.


s/and/an/ ?  If so - yes.

How about:

1. listeners should not raise exceptions.

2. If they do (outside the contract), the exception should be (loggedand) dropped.


It seems odd to me to have an exception and the triple be added.

 I believe that the
contract with listeners is:

    1. they are notified after the event they are listening for has been
    completed.  That they are not notified if an Exception is thrown in the add.
    2. if a listener throws an exception it will not undo the add or delete.


Yes.

    3. I believe that: #1 means that the listeners would be notified at the
    commit of a transaction, so listeners are guaranteed to have messages
    queued by the end of the commit (if present) or at the end of add (if no
    transaction is present).

A basic listener is inside the transaction where the add() ishappening.. They are on the same thread anyway. I don't know how toimplement same-thread, different visibility.

I wonder if listeners can be described with a separate contact - makesthe contract tests modular.


e.g.
C1/ Contract for add/delete/find/and others as actions of set of triples.

C2/ Contract for listeners where

add-with-listener => core add contract + listener called.

This does lead to the possibility that a graph implementation may need to
notify other components within the transaction that the add or delete was
completed -- I am not certain that this is needed but raise the point here
for further discussion if necessary.

So the full process for an add is

    1. begin add( triple )
    2. if adding is not allowed (Capabilities.addAllowed() returns false)
    throw AddDeniedException.
    3. add to the underlying storage system, may throw an exception.
       1. If a checked exception is thrown wrap it in an AddDeniedException.

Any other kind of exception is presumably a system error and leaves thesystem in unknown state.

       4. if not in a transaction notify listeners of add
    5. end add(triple)


"end add" means return to caller?

So far, so good.

    6. begin commit if in transaction
    7. commit the change so that it is visible to outside of the transaction.
    8. notify listeners of add.
    9. end commit.


I don't understand this. Are you trying for JDBC autocommit effects?

See overall comments on transactions.

Illustration:
W1, R1 R2 R3 -- transactions.

Thread 1                         Thread 2
begin W1
add t1
add t2
                                 begin R1 -find-end R1 (sees no triples)
add t3
find (sees 3 triples)
add t4
                                 begin R2-find-end R2 (sees no triples)
delete t2
commit W1
                                 begin R3-find-end R3 sees t1 t3 t4

At no point is t2 visible outside thread 1.

At no point are exactly triples t1 and t3 but not t4 visible outsidethread 1.

Strictly, R3 is either see t1, t3, t4 or see no triples. There is noguarantee on the exact time point. A detail of transactions.

Autocommit where an implicit begin-commit goes round any add call thatis not made from a thread in a transaction is a possibility.


i.e.
operation X
if not in a transaction
  =>
begin
 operation X
commit

BUT this is very, very expensive when it's apserisstent storage to get Ddurability.

To get D you need a disk write so ~5-10ms of rotational disk (disk seektime), 0.1ms if an SSD but it is also a system call (virtual memorycosts) and still has to contend for the SSD controller. Adding a commiton every triple add reduces the maximum update rate to 10K triples persecond in ideal circumstances without any OS costs. Taht's dire.Batching wins!

c.f. JDBC where it is usually default "on" (safety) and leads to otherissues of dire performance at this granularity.

If that it the case then the full process for a delete is

    1. begin delete( triple )
    2. if deleting is not allowed (Capabilities.deleteAllowed() returns
    false) throw DeleteDeniedException.
    3. delete from the underlying storage system, may throw an exception.
       1. If a checked exception is thrown wrap it in a
       DeleteDeniedException.
       4. if not in a transaction notify listeners of delete
    5. end delete(triple)
    6. begin commit if in transaction
    7. commit the change so that it is visible to outside of the transaction.
    8. notify listeners of delete.
    9. end commit.

As for the find process

    1. returns an ExtendedIterator of triples that match the specified
    triple.
    2. If inside a transaction all uncommited triples are candidates for
    matching.

The iterator may throw a ConcurrentModificationException in conditions
outlined by
http://docs.oracle.com/javase/7/docs/api/java/util/ConcurrentModificationException.html
with the following caveat:

    - If the find is taking place within a transaction and the current
    thread has not modified the underlying data the
    ConcurrentModificationException may not be thrown.

We can treat ConcurrentModificationException as an independent conceptfrom transactions.


        Andy



Thoughs?
Claude





On Mon, Aug 11, 2014 at 6:19 PM, Andy Seaborne <a...@apache.org> wrote:

On 08/08/14 22:13, Claude Warren wrote:

This is a message stack for Graph SPI Contract testing.  It covers only
the
Jena 2 Graph Contract.  This an attempt to document the current Graph
contract.  Any correction should specify the bullet point number.


Overall:

Getting the exact contract is hard and I'm assuming this is only for
single-threaded code.

Maybe start with a subset of Graph

.add
.delete
.find

then add listeners into the picture
then define other operations in terms of the primitives:

.contains
.remove
.clear

Transactions:

The text around transactions does not distinguish being inside or outside
a transaction.

There are 2 base kinds of graphs - ones in datasets (views) and standalone
ones, then things like InfGraph and other added functionality. Transactions
on view graphs need to be defined in the context of the dataset because
transactions are connected.


      1. add() -- technically from GraphAdd


IMO The "GraphAdd" interface serves no purpose.

         1. when a triple is added to a graph all registered listeners must


        receive an (add graph triple) message


It's hard to define listeners:

   Does a listener see the graph before or after the triple is added?
   Is a listener called if AddDeniedException is raised?
   Can a listener cause AddDeniedException to be raised?
   Is the listener guaranted to have been called by the
     time add() returns?

hence the suggestion of starting with just the basic operations.

         2. subsequent graph.contains( triple ) must return true.

        3. If add is performed within a transaction the listeners are not

        notified until after the commit.
        4. If graph is read only (Capabilities.addAllowed() returns false)
        must throw AddDeniedException


1.1 and 1.2 have "must" text

Surely it's:

Either
    the triple is added
or
    an AddDeniedException exception is thrown.

      2. clear()


This is like remove(Node.ANY, Node.ANY, Node.ANY) except for the listener
contract?

         1. If the graph can be empty (Capabilities.canBeEmpty()) there

should

        be no triples returned from find( Triple.ANY )


Nothing except tests uses Capabilities.canBeEmpty.

         2. If the graph can not be empty there should only be the elements


        that were present when the graph was created.


This implies part of the contract for create in that create does not take
initial contents.

Graph g2 = view of g1
g1 can not be empty

         3. if delete is not allowed (Capabilities.canDelete() is


        false) clear() must throw DeleteDeniedException


An alternative is that if clear() causes a change, DeleteDeniedException
is raised.

Example - if the empty, read-only graph is cleared, why should
DeleteDeniedException be raised?

There is a relationship to remove(ANY,ANY,ANY)

      3. close()

        1. after close isClosed() should return true
        2. calling close on closed graph should not throw an exception.
        3. calling any Graph method other than close() on a closed graph
        should throw a ClosedException


Is there a need for close() long term, if not, then the deatiled contract
is moot.

This form of Graph.close() might work for a basic, storage graph but there
are other cases.

A graph may be a view of another - close is meaningless and is more
usefully a no-op.

If the graph is from a system wide cache, close() might be a no-op so as
to protect the cache.

      4. contains()


Defined as "find(S,P,O).hasNext()"

         1. returns true if the graph contains the specified triple.

           1. Node.ANY will match any node in the position.
        2. if the graph supports transactions and a transaction is in

        progress the graph will only not show any triples that only exist
within
        the transaction.


If an app goes:

   begin
   add(triple)
   contains(triple) -> false

it's going to be a bit confusing!

      5. delete()

        1. if delete is not allowed (Capabilities.canDelete() is false)
        delete() must throw DeleteDeniedException
        2. when a triple is deleted from  a graph all registered listeners

        must receive an (delete graph triple) message
        3. subsequent graph.contains( triple ) must return false.
        4. If add is performed within a transaction the listeners are not

        notified until after the commit.


Same listener issues as add()

      6. dependsOn()


What is this used for nowadays?

         1. true if this graph's content depends on the other graph. May be


        pessimistic (ie return true if it's not sure). Typically true
when a  graph
        is a composition of other graphs, eg union.
     7. find()
        1. returns an iterator of triples that match the specified triple.


And the iterator?

Specifically, there are ConcurrentModificationException issues even in
single threaded code.

      8. getBulkUpdateHandler() -- deprecated / removed -- no tests

     9. getCapabilities()


Aside: Capabilities need clearing up.  It's too black-and-white. it can't
express the totality of possibilities.

Big question: what use does application code make of capabilities?  I
suspect none, or noe except to flag errors.  I can't envisage getting a
graph that says"addAllowed=false" and doign anything but signalling the
user that they can't do what ever the task is.   Yet it's going to have
("should have") error handling code anyway.

Maybe it reduces to

    Graph.isReadOnly

I'm unconvinced the add/delete distinction matters.  I can think of graph
where there is a difference (append-only) but not of an application that
adapts based on this other than to say "no, can't".

e.g.
addAllowed( boolean everyTriple );

Capabilities.handlesLiteralTyping -- can't say "some, not others"

         1. must not return null.


If we retain the current Capabilities, then we need a way to say "don't
know".  Some of the capabilities are definite yes/no.

e.g addAllowed -- presumably "yes" on most graphs but what if there is a
security wrapper?  Or system resources are

         2. capabilities must match other results.

           1. if not addAllowed() , add must throw exception
           2. if not deleteAllowed(),
              1. delete must throw exception
              2. clear must throw exception


clear() of an already empty graph?

            3. if iteratorRemoveAllowed(), iterator from find must allow

           remove()
           4. if canBeEmpty()
              1. initial construction must be empty()
              2. clear() must be empty.
           3. must pass Capabilities contract tests.
     10. getEventManager()
        1. May not return null
        2. Listeners registered with event manager must be notified of
        changes.
        3. EventManager must pass GraphEventManager contract test.
     11. getPrefixMapping()
        1. May not be null
        2. changes to the prefixes managed by the PrefixMapping returned

         getPrefixMapping() must be reflected in all other PrefixMapping
classes
        from the same graph.


I disagree with the defined contract in javadoc! The "same object" is
horrible!!

         3. Changes made to a prefix mapping within a transaction are

visible

        outside of the transaction and are not rolled back by the
transaction.


!!

         4. PrefixMapping  must pass the PrefixMapping contract test

     12. getStatisticsHandler()


No longer used.

         1. may be null

        2. if not null must pass the GraphStatisticsHandler contract test.
        3. all GraphStatisticsHandlers returned must pass handler.equals(
        handler2 )
     13. getTransactionHandler()
        1. may not be null
        2. must pass the TransactionHandler contract test.
     14. isClosed()
        1. must return false when the graph is created.
        2. must return true after the close() has been called.
     15. isEmpty()
        1. must return true when graph is created if
        Capabilities.canBeEmpty() is true


I don't understand this - a graph may be a view of another soit's not
empty at the start.

         2. must not return true after triples are added

        3. must return true after all triples are deleted if
        Capabilities.canBeEmpty() is true.
        4. must return true after clear() if Capabilities.canBeEmpty() is
        true.
     16. isIsomorphicWith() -- from (

     http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#
section-graph-equality):
      Two RDF graphs G and G' are isomorphic (that is, they have an
identical
     form) if there is a bijection M between the sets of nodes of the two
     graphs, such that:
        1. M maps blank nodes to blank nodes.
        2. M(lit)=lit for all RDF literals lit which are nodes of G.
        3. M(iri)=iri for all IRIs iri which are nodes of G.
        4. The triple ( s, p, o ) is in G if and only if the triple ( M(s),

        p, M(o) ) is in G'
     17. remove()
        1. when a triple is removed from a graph all registered listeners

        must receive an (remove graph triple) message


remove() removes by pattern

After remove(S,P,O), contains(S,P,O) is false (S/P/O can be Node.ANY)

         2. subsequent graph.contains( triple ) must return false, unless

the

        triple was is in the newly constructed  graph and
Capabilities.canBeEmpty()
        is false.
        3. If removed is performed within a transaction the listeners are
not

        notified until after the commit.
        4. If delete is denied (Capabilities.deleteAllowed() returns false)
        must throw DeleteDeniedException
     18. size()
        1. if Capabilities.sizeAccurate() is true
           1. if transactions are supported
           (TransactionHandler.transactionsSupported() is true)
              1. the size from within the transaction must function
                 1. adding a triple must increment the size of the graph.
                 2. removing a triple must decrement the size of the graph.
              2. the size from outside the transaction must not change
           2. if transactions are not in
           supported  (TransactionHandler.transactionsSupported() is
false)
              1.  adding a triple must increment the size of the graph.
              2. removing a triple must decrement the size of the graph.
           2. if Capabilities.sizeAccurate() is false
           1. if transactions are supported
           (TransactionHandler.transactionsSupported() is true)
              1. the size from within the transaction must function
                 1. adding a triple may increment the size of the graph.
                 2. adding a triple may not decrement the size of the
graph.
                 3. removing a triple may decrement the size of the graph.
                 4. removing a triple may not increment the size of the
graph.
              2. the size from outside the transaction must not change
                 1. adding a triple may not decrement the size of the
graph.
                 2. removing a triple may not increment the size of the
graph.
                 2. if transactions are not in
           supported  (TransactionHandler.transactionsSupported() is
false)
              1. adding a triple may increment the size of the graph.
              2. adding a triple may not decrement the size of the graph.
              3. removing a triple may decrement the size of the graph.
              4. removing a triple may not increment the size of the graph.



Please comment as appropriate.
Claude

Re: Graph SPI Contract

Reply via email to