QueryHandler

Andy Seaborne Tue, 04 Sep 2012 01:07:15 -0700

On 04/09/12 07:29, Claude Warren wrote:

+1


I have to agree that this is a nice simplification of the jena complexity.
  It would be nice to know why they were created in the first place, just to
ensure that those issues are accounted for.  However, I don't see any
reason not to do this and several reasons to proceed.

Claude


Good question.

What I want to do is simply and reduce the Graph layer.Graph/Triple/Node is a key abstraction for extension both downwards(storage, inference) and upwards (Model, query, client).

I can give my personal, looking-back perspective and remembering Iwasn't there right at the beginning of the Model API.

And we learn - sometimes things looked to be the right thing at the timebut don't always turn out as expected either because a design didn'twork out (internal) or the world has gone in a different direction(external).

These features here aren't used or are used so little that they createcomplexity for an extension and for maintenance with very little benefit.

BulkUpdateHandler falls into the internal category. Batching changeswas obviously important right from the very first database backedstorage layer (before even RDB) because doing in a batch can be cheaperthan doing them one at a time (e.g. JDBC commit around a batch is muchcheaper that a commit for every triple).


BulkUpdateHandler does not meet the needs for that:

1/ The batch size is driven from the client but the correct size is amatter for the storage if batching matters at all.

2/ It complicates each application to manage the batching when it couldbe done once in the graph implementation if it matters. For a libraryfunction, like a parser, to know the right batching is hard and probablymesses up it's API.


Streaming + storage-side internal batching is better.

So keep the operations that have some practical use, for example, addingGraph.removeAll, and don't put it off to one side. It can still beoverridden.



Reification:

Semweb has moved on and reification is not important - quoting onetriple leaves the issue of grouping of quoted triples together and oftenfact-units come in the form of more than one triple. Named graphs areplaying the role for quoted facts - named graph post date reification.

The number of uses of it outside "standard" is very low. "standard" canbe done in code over a store of triples; the other modes "minimal" and"convenient" need some state to be kept.


http://jena.apache.org/documentation/notes/reification.html#reification-styles

(most of the rest of the documentation remains - the Model API is onylaffected in that there is only one style).

Keeping the state is an implementation cost and complexity especiallyfor persistent storage layers. Quite a lot of effort for the RDB layerwent into reification.


So maintain the interface at the Model level - make Graph simpler.


graph.QueryHandler (qQH):

Once up to a time there was RDQL and an RDQL query is, in SPARQL terms,a basic graph patterns, a filter and a projection and nothing else. qQHdoes that. SPARQL is a bit more complicated. qQH isn't the rightbuilding block for SPARQL - it's execution API doesn't extend well intoa larger framework so we have ended up with some duplication.

So remove it. It all goes to making graph simpler - and Graph is a keyabstraction for extension.


        Andy


On Mon, Sep 3, 2012 at 6:33 PM, Andy Seaborne <[email protected]> wrote:

As part of wanting to tidy up and reduce the "core" of Jena, I'd like to
propose we

   Remove BulkUpdateHandler interface
     Migrate it's few useful operation to Graph.

   Start to provide reification with "standard" only.
     graph.QueryHandler only used to support reification.


== BulkUpdateHandler

The two implementations I know of are

  SimpleBulkUpdateHandler
  UpdateHandlerSDB

A few of it's operations are useful but most turn into nothing but loops
to call add(Triple)/delete(Triple).

Event handling details each operation kind but, as far as I can see, this
becomes individual calls to an "addedStatement"/"**removedStatement" at
the Model level i.e. the different between adding by array or list or
iterator gets lost.

The useful operations are:
   add(Graph)
   delete(Graph)
   removeAll()
   remove(s,p,o)

and the slightly bizarre:

   add(Graph, withReifications)
   delete(Graph, withReifications)

(see below about reification)

and the less useful (because they don't relate to the way the storage
might properly batch changes - the provider shouldn't decide the batch
boundaries) which turn into add(Triple)/delete(Triple)

   add(Triple [])
   add( List<Triple>)
   add( Iterator<Triple>)
   delete(Triple [])
   delete( List<Triple>)
   delete( Iterator<Triple>)

The only calls to these "add" operations are from ARP which batches it's
changes into units of 1000, but not a whole parser run. As the
SimpleBulkUpdate handler turns these into single calls, nothing gained.

My proposal is that the useful operations are moved to Graph, the code for
the withReifications forms migrate to the only callers in ModelCom.

UpdateHandlerSDB:

This only uses the UpdateHandler interface to wrap the calls in
start/finish bulk update to implicitly increase the scope of bulk updates.
  But it isn't

== Reification

The intent is to only support the default standard eventually.

Standard can be provided by code, with no retained state (partial
reificiations).  TDB and SDB do not support anything except "standard".

This leads to ....

(graph.)QueryHandler:
It's main use is with reification.  I think we can remove it when
reification is replaced by a straight code implications.

         Andy

See also JENA-189

Re: Evolution: BulkUpdateHandler / Reification / QueryHandler

Reply via email to