Re: Streaming SPARQL Updates

Andy Seaborne Thu, 31 Jan 2013 06:46:33 -0800

Great stuff and getting into 2.10 is good timing.

I'm working through the changes so comments may come in a bit of a dripfeed, and hence not necessarily in importance order.



** UpdateEngineFactory
(default methods interfaces would be nice!)

++++
It would be good to hear from from all storage layer implementer on this.
++++

If we are making changes, then it would be better to rework this. Ifind the two independent update engine concepts confusing and I think itdoes not help storage implementers.

When streaming, the ability to look at the request before decidingwhether to accept it is lost. No big deal - I don't think this is everused. Looking at the request can be done by an indirection engine thatdecides at the point of execution which real engine to pass calls to.


So if we go for one UpdateEngine interface, UpdateEngineFactory becomes:

  public boolean accept(GraphStore graphStore, Context context)
  public UpdateEngine create(GraphStore graphStore, Context context)

Initial binding looks like a mistake because an update is multipleoperations and historical at best. Remove this (via a deprecationcycle?). Use BIND or VALUES nowadays.

Require an update engine impl to handle any request - it can always bumpto the general engine.

The core interface API is streaming and UpdateEngineBase provides stream-> request that can be overridden.

(have now seen Rob's comment suggesting interfaces that can beinstanceof'ed so a can just extend UpdateEngineBase as before).



** UsingList:

When applied to a whole request, this is a protocol feature. I thinkit's better to keep it as such. It seems a small feature for API usethet could be handled via the context (it is the same as rewriting eachUpdateModify with no set UsingList to have the global one).

The use case is forming a dataset for the WHERE part of an update from alarge collection of graphs. The storage engine may have opinions here(e.g. security, or only certain fixed shapes) so the selection processis, to my mind, impl specific.


(but UpdateWithUsing - could do with the UsingList locally internally!)


Alternative proposal:
  have two datasets for update - the one to update and the one to query
  GraphStore has a getDatasetForWHERE() operation.
    We need to track down the use of GraphStore for WHERE
    but only in the general purpose engine.
    Normally it's "this" - can be reset.
  It moves responsibility to GraphStore creation.
  No UsingList in the UpdateExecutionFactory API is needed

In ARQ, you create an appropriately structures GraphStore with thegraphs of choice. In Fuseki, it uses DynamicDatasets.dynamicDataset

(c.f. UpdateEngineWorker.processUsing) for building the view dataset.


** DataSinks

I didn't understand what's happening with the sinks, sorry.

An update request can be more than one operations, including more thanone INSERT DATA or DELETE DATA so multiple sinks of both kinds per request?


Shouldn't the sinks be per operation?  In the writer for example:

    @Override
    public void visit(UpdateDataInsert update)
    {
        Iter.sendToSink(update.getQuads(),
                        getInsertDataSink());
    }

maybe called for several UpdateDataInsert's

Re: Streaming SPARQL Updates

Reply via email to