Re: Graph SPI Contract

Andy Seaborne Sun, 24 Aug 2014 06:20:08 -0700

On 23/08/14 18:07, Claude Warren wrote:

Andy,


  I think we agree on transactions.

I think the difference is in the understanding of when listeners are
triggered.

I realize that all current implementations of listeners appear to be on a
single thread.  But I did not realize that was a requirement of the
listener interface.  (yes that would be part of the listener contract
test).  I would think that a listener could place messages on a queue and
that would be sufficient to meet the listener interface -- but that would
mean the state within a transaction would be visible outside of the
transaction.

Fine - that is building a higher-level mechanism on top of the basiccallback system.

The fact you can leak information from inside a transaction is nothingspecial here. You can do it normally just by missing bytes acrossthreads. Don't! or at least, do with care.

You can in JDBC as well. Use multiple connections. Or store in avariable, abort the transaction and start again.

I have wanted the ability for one thread to be notified when another thread
completed a transaction -- basically when the changes became visible.  But
for now that appears to be outside the scope of a listener.

It's certainly outside the scope of an add() - transactions are groupsof add/delete.


You want transaction monitoring - callbacks on transaction operations.

As listeners are same thread callbacks, does this mean that when a
transaction is rolled back the listeners must be notified to undo the
previous notifications -- for example

begin Tx
add T1
listener notified of add T1
rollback Tx
listener notified of delete T1


I'd say "no".  T1 is not deleted (there is no delete() call).

You have passed information out of a transaction.

If not then we need to document this case.

To be honest it is this issue that made me think that listeners should be
notified after the transaction committed.  If listeners are notified after
commit then they can also be on different threads.

That is a higher level abstraction - we should be able to supportwriting that but not provide it as the one design. There are manydifferent designs which is why listeners were made the basic buildingblock on which applications can do what they want.


        Andy


Claude





On Sat, Aug 23, 2014 at 4:22 PM, Andy Seaborne <a...@apache.org> wrote:

Claude,

We seem to have different understandings about transactions.

I see a transaction (as in ACID) as defining a scope or view of the system
(or valid state of the system). Within a transaction changes happen only
from actions of the transaction, not outside.

A transaction sees a consistent state of the world - transactions are
serialized onto the time line and appear to happen instantaneously as a
single unit.  From outside, nothing changes until all of a sudden all
changes are made at once.  These are "serializable" isolation which is the
ideal.

Weaker forms of isolation exist but they have unpredictable effects. For
us, a find() is a range query so even isolation level "repeatable reads"
can cause a find to see a state of the storage that never existed in any
application view point.

http://en.wikipedia.org/wiki/Isolation_%28database_systems%29

We aren't considering nested transactions.

add() just needs to define what add() does inside a transaction.  At some
later time, all the change become visible.

== add()

Pre condition:
   graph exists

action:
   add(t)

Post condition:
in the view of the transaction for the current thread:
   if no exception
     graph contains t
   else if AddDeniedException
     graph contains t if and only if the graph contained t before.

Listeners are same-thread callbacks so they are in the same transaction as
the update.  Complex systems on top of this are out of scope. Jena provides
the building block.


On 17/08/14 11:43, Claude Warren wrote:

I think the contract has to cover multi-threaded possibilities.  However,
for the most part the document I originally proposed is the view from
within a single thread.



For non-transactional, multi-threaded systems, I don't think anything
needs to said except "don't!!" - or rather "single view" or else all bets
are off.  Failure modes are way too implementation specific - even across
JVMs (see IBM vs oracle JVMs for HashMap as we have know here).

Jena in-memory is read-concurrent safe.
http://jena.apache.org/documentation/notes/concurrency-howto.html

Even that is non-trivial to provide in the inference engine.


  I agree that graphAdd serves no purpose and go as far as saying it should

be removed in Jena 3.


Yes.

  Think that defining the add with the listener will clarify the contract,

but we need clarification of the Listener contract later.

I think that the current process is:

     1. triple added to or deleted from graph
     2. listeners notified


I think that this is correct but that we need to add that exceptions in
the
listeners may not raise and add denied exception.


s/and/an/ ?  If so - yes.

How about:

1. listeners should not raise exceptions.
2. If they do (outside the contract), the exception should be (logged and)
dropped.

It seems odd to me to have an exception and the triple be added.

   I believe that the

contract with listeners is:

     1. they are notified after the event they are listening for has been

     completed.  That they are not notified if an Exception is thrown in
the add.
     2. if a listener throws an exception it will not undo the add or
delete.


Yes.

      3. I believe that: #1 means that the listeners would be notified at

the

     commit of a transaction, so listeners are guaranteed to have messages
     queued by the end of the commit (if present) or at the end of add (if
no
     transaction is present).


A basic listener is inside the transaction where the add() is happening..
They are on the same thread anyway.  I don't know how to implement
same-thread, different visibility.

I wonder if listeners can be described with a separate contact - makes the
contract tests modular.

e.g.
C1/ Contract for add/delete/find/and others as actions of set of triples.

C2/ Contract for listeners where

add-with-listener => core add contract + listener called.

  This does lead to the possibility that a graph implementation may need to

notify other components within the transaction that the add or delete was
completed -- I am not certain that this is needed but raise the point here
for further discussion if necessary.

So the full process for an add is

     1. begin add( triple )
     2. if adding is not allowed (Capabilities.addAllowed() returns false)
     throw AddDeniedException.
     3. add to the underlying storage system, may throw an exception.
        1. If a checked exception is thrown wrap it in an
AddDeniedException.


Any other kind of exception is presumably a system error and leaves the
system in unknown state.

         4. if not in a transaction notify listeners of add

     5. end add(triple)


"end add" means return to caller?

So far, so good.

      6. begin commit if in transaction

     7. commit the change so that it is visible to outside of the
transaction.
     8. notify listeners of add.
     9. end commit.


I don't understand this. Are you trying for JDBC autocommit effects?

See overall comments on transactions.

Illustration:
W1, R1 R2 R3 -- transactions.

Thread 1                         Thread 2
begin W1
add t1
add t2
                                  begin R1 -find-end R1 (sees no triples)
add t3
find (sees 3 triples)
add t4
                                  begin R2-find-end R2 (sees no triples)
delete t2
commit W1
                                  begin R3-find-end R3 sees t1 t3 t4

At no point is t2 visible outside thread 1.

At no point are exactly triples t1 and t3 but not t4 visible outside
thread 1.

Strictly, R3 is either see t1, t3, t4 or see no triples.  There is no
guarantee on the exact time point.  A detail of transactions.

Autocommit where an implicit begin-commit goes round any add call that is
not made from a thread in a transaction is a possibility.

i.e.
operation X

if not in a transaction
   =>
begin
  operation X
commit

BUT this is very, very expensive when it's apserisstent storage to get D
durability.

To get D you need a disk write so ~5-10ms of rotational disk (disk seek
time), 0.1ms if an SSD but it is also a system call (virtual memory costs)
and still has to contend for the SSD controller.  Adding a commit on every
triple add reduces the maximum update rate to 10K triples per second in
ideal circumstances without any OS costs.  Taht's dire. Batching wins!
]

c.f. JDBC where it is usually default "on" (safety) and leads to other
issues of dire performance at this granularity.

  If that it the case then the full process for a delete is


     1. begin delete( triple )
     2. if deleting is not allowed (Capabilities.deleteAllowed() returns
     false) throw DeleteDeniedException.
     3. delete from the underlying storage system, may throw an exception.
        1. If a checked exception is thrown wrap it in a
        DeleteDeniedException.
        4. if not in a transaction notify listeners of delete
     5. end delete(triple)
     6. begin commit if in transaction
     7. commit the change so that it is visible to outside of the
transaction.
     8. notify listeners of delete.
     9. end commit.


As for the find process

     1. returns an ExtendedIterator of triples that match the specified
     triple.
     2. If inside a transaction all uncommited triples are candidates for

     matching.

The iterator may throw a ConcurrentModificationException in conditions
outlined by
http://docs.oracle.com/javase/7/docs/api/java/util/
ConcurrentModificationException.html
with the following caveat:

     - If the find is taking place within a transaction and the current

     thread has not modified the underlying data the
     ConcurrentModificationException may not be thrown.


We can treat ConcurrentModificationException as an independent concept
from transactions.

         Andy


Thoughs?
Claude





On Mon, Aug 11, 2014 at 6:19 PM, Andy Seaborne <a...@apache.org> wrote:

  On 08/08/14 22:13, Claude Warren wrote:


  This is a message stack for Graph SPI Contract testing.  It covers only

the
Jena 2 Graph Contract.  This an attempt to document the current Graph
contract.  Any correction should specify the bullet point number.

Overall:

Getting the exact contract is hard and I'm assuming this is only for
single-threaded code.

Maybe start with a subset of Graph

.add
.delete
.find

then add listeners into the picture
then define other operations in terms of the primitives:

.contains
.remove
.clear

Transactions:

The text around transactions does not distinguish being inside or outside
a transaction.

There are 2 base kinds of graphs - ones in datasets (views) and
standalone
ones, then things like InfGraph and other added functionality.
Transactions
on view graphs need to be defined in the context of the dataset because
transactions are connected.


       1. add() -- technically from GraphAdd

IMO The "GraphAdd" interface serves no purpose.

          1. when a triple is added to a graph all registered listeners
must


         receive an (add graph triple) message

It's hard to define listeners:

    Does a listener see the graph before or after the triple is added?
    Is a listener called if AddDeniedException is raised?
    Can a listener cause AddDeniedException to be raised?
    Is the listener guaranted to have been called by the
      time add() returns?

hence the suggestion of starting with just the basic operations.

          2. subsequent graph.contains( triple ) must return true.

         3. If add is performed within a transaction the listeners are
not

         notified until after the commit.
         4. If graph is read only (Capabilities.addAllowed() returns
false)
         must throw AddDeniedException

1.1 and 1.2 have "must" text

Surely it's:

Either
     the triple is added
or
     an AddDeniedException exception is thrown.

       2. clear()

This is like remove(Node.ANY, Node.ANY, Node.ANY) except for the listener
contract?

          1. If the graph can be empty (Capabilities.canBeEmpty()) there

should

         be no triples returned from find( Triple.ANY )

Nothing except tests uses Capabilities.canBeEmpty.

          2. If the graph can not be empty there should only be the
elements


         that were present when the graph was created.

This implies part of the contract for create in that create does not take
initial contents.

Graph g2 = view of g1
g1 can not be empty

          3. if delete is not allowed (Capabilities.canDelete() is


         false) clear() must throw DeleteDeniedException

An alternative is that if clear() causes a change, DeleteDeniedException
is raised.

Example - if the empty, read-only graph is cleared, why should
DeleteDeniedException be raised?

There is a relationship to remove(ANY,ANY,ANY)

       3. close()

         1. after close isClosed() should return true
         2. calling close on closed graph should not throw an exception.
         3. calling any Graph method other than close() on a closed graph
         should throw a ClosedException

Is there a need for close() long term, if not, then the deatiled contract
is moot.

This form of Graph.close() might work for a basic, storage graph but
there
are other cases.

A graph may be a view of another - close is meaningless and is more
usefully a no-op.

If the graph is from a system wide cache, close() might be a no-op so as
to protect the cache.

       4. contains()

Defined as "find(S,P,O).hasNext()"

          1. returns true if the graph contains the specified triple.

            1. Node.ANY will match any node in the position.
         2. if the graph supports transactions and a transaction is in

         progress the graph will only not show any triples that only
exist
within
         the transaction.

If an app goes:

    begin
    add(triple)
    contains(triple) -> false

it's going to be a bit confusing!

       5. delete()

         1. if delete is not allowed (Capabilities.canDelete() is false)
         delete() must throw DeleteDeniedException
         2. when a triple is deleted from  a graph all registered
listeners

         must receive an (delete graph triple) message
         3. subsequent graph.contains( triple ) must return false.
         4. If add is performed within a transaction the listeners are
not

         notified until after the commit.

Same listener issues as add()

       6. dependsOn()

What is this used for nowadays?

          1. true if this graph's content depends on the other graph. May
be


         pessimistic (ie return true if it's not sure). Typically true
when a  graph
         is a composition of other graphs, eg union.
      7. find()
         1. returns an iterator of triples that match the specified
triple.

And the iterator?

Specifically, there are ConcurrentModificationException issues even in
single threaded code.

       8. getBulkUpdateHandler() -- deprecated / removed -- no tests

      9. getCapabilities()

Aside: Capabilities need clearing up.  It's too black-and-white. it can't
express the totality of possibilities.

Big question: what use does application code make of capabilities?  I
suspect none, or noe except to flag errors.  I can't envisage getting a
graph that says"addAllowed=false" and doign anything but signalling the
user that they can't do what ever the task is.   Yet it's going to have
("should have") error handling code anyway.

Maybe it reduces to

     Graph.isReadOnly

I'm unconvinced the add/delete distinction matters.  I can think of graph
where there is a difference (append-only) but not of an application that
adapts based on this other than to say "no, can't".

e.g.
addAllowed( boolean everyTriple );

Capabilities.handlesLiteralTyping -- can't say "some, not others"

          1. must not return null.

If we retain the current Capabilities, then we need a way to say "don't
know".  Some of the capabilities are definite yes/no.

e.g addAllowed -- presumably "yes" on most graphs but what if there is a
security wrapper?  Or system resources are

          2. capabilities must match other results.

            1. if not addAllowed() , add must throw exception
            2. if not deleteAllowed(),
               1. delete must throw exception
               2. clear must throw exception

clear() of an already empty graph?

             3. if iteratorRemoveAllowed(), iterator from find must allow

            remove()
            4. if canBeEmpty()
               1. initial construction must be empty()
               2. clear() must be empty.
            3. must pass Capabilities contract tests.
      10. getEventManager()
         1. May not return null
         2. Listeners registered with event manager must be notified of
         changes.
         3. EventManager must pass GraphEventManager contract test.
      11. getPrefixMapping()
         1. May not be null
         2. changes to the prefixes managed by the PrefixMapping returned

          getPrefixMapping() must be reflected in all other PrefixMapping
classes
         from the same graph.

I disagree with the defined contract in javadoc! The "same object" is
horrible!!

          3. Changes made to a prefix mapping within a transaction are

visible

         outside of the transaction and are not rolled back by the
transaction.

!!

          4. PrefixMapping  must pass the PrefixMapping contract test

      12. getStatisticsHandler()

No longer used.

          1. may be null

         2. if not null must pass the GraphStatisticsHandler contract
test.
         3. all GraphStatisticsHandlers returned must pass
handler.equals(
         handler2 )
      13. getTransactionHandler()
         1. may not be null
         2. must pass the TransactionHandler contract test.
      14. isClosed()
         1. must return false when the graph is created.
         2. must return true after the close() has been called.
      15. isEmpty()
         1. must return true when graph is created if
         Capabilities.canBeEmpty() is true

I don't understand this - a graph may be a view of another soit's not
empty at the start.

          2. must not return true after triples are added

         3. must return true after all triples are deleted if
         Capabilities.canBeEmpty() is true.
         4. must return true after clear() if Capabilities.canBeEmpty()
is
         true.
      16. isIsomorphicWith() -- from (

      http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#
section-graph-equality):
       Two RDF graphs G and G' are isomorphic (that is, they have an
identical
      form) if there is a bijection M between the sets of nodes of the
two
      graphs, such that:
         1. M maps blank nodes to blank nodes.
         2. M(lit)=lit for all RDF literals lit which are nodes of G.
         3. M(iri)=iri for all IRIs iri which are nodes of G.
         4. The triple ( s, p, o ) is in G if and only if the triple (
M(s),

         p, M(o) ) is in G'
      17. remove()
         1. when a triple is removed from a graph all registered
listeners

         must receive an (remove graph triple) message

remove() removes by pattern

After remove(S,P,O), contains(S,P,O) is false (S/P/O can be Node.ANY)

          2. subsequent graph.contains( triple ) must return false, unless

the

         triple was is in the newly constructed  graph and
Capabilities.canBeEmpty()
         is false.
         3. If removed is performed within a transaction the listeners
are
not

         notified until after the commit.
         4. If delete is denied (Capabilities.deleteAllowed() returns
false)
         must throw DeleteDeniedException
      18. size()
         1. if Capabilities.sizeAccurate() is true
            1. if transactions are supported
            (TransactionHandler.transactionsSupported() is true)
               1. the size from within the transaction must function
                  1. adding a triple must increment the size of the
graph.
                  2. removing a triple must decrement the size of the
graph.
               2. the size from outside the transaction must not change
            2. if transactions are not in
            supported  (TransactionHandler.transactionsSupported() is
false)
               1.  adding a triple must increment the size of the graph.
               2. removing a triple must decrement the size of the graph.
            2. if Capabilities.sizeAccurate() is false
            1. if transactions are supported
            (TransactionHandler.transactionsSupported() is true)
               1. the size from within the transaction must function
                  1. adding a triple may increment the size of the graph.
                  2. adding a triple may not decrement the size of the
graph.
                  3. removing a triple may decrement the size of the
graph.
                  4. removing a triple may not increment the size of the
graph.
               2. the size from outside the transaction must not change
                  1. adding a triple may not decrement the size of the
graph.
                  2. removing a triple may not increment the size of the
graph.
                  2. if transactions are not in
            supported  (TransactionHandler.transactionsSupported() is
false)
               1. adding a triple may increment the size of the graph.
               2. adding a triple may not decrement the size of the
graph.
               3. removing a triple may decrement the size of the graph.
               4. removing a triple may not increment the size of the
graph.



Please comment as appropriate.
Claude

Re: Graph SPI Contract

Reply via email to