Great stuff and getting into 2.10 is good timing.

I'm working through the changes so comments may come in a bit of a drip feed, and hence not necessarily in importance order.


** UpdateEngineFactory
(default methods interfaces would be nice!)

++++
It would be good to hear from from all storage layer implementer on this.
++++

If we are making changes, then it would be better to rework this. I find the two independent update engine concepts confusing and I think it does not help storage implementers.

When streaming, the ability to look at the request before deciding whether to accept it is lost. No big deal - I don't think this is ever used. Looking at the request can be done by an indirection engine that decides at the point of execution which real engine to pass calls to.

So if we go for one UpdateEngine interface, UpdateEngineFactory becomes:

  public boolean accept(GraphStore graphStore, Context context)
  public UpdateEngine create(GraphStore graphStore, Context context)

Initial binding looks like a mistake because an update is multiple operations and historical at best. Remove this (via a deprecation cycle?). Use BIND or VALUES nowadays.

Require an update engine impl to handle any request - it can always bump to the general engine.

The core interface API is streaming and UpdateEngineBase provides stream -> request that can be overridden.

(have now seen Rob's comment suggesting interfaces that can be instanceof'ed so a can just extend UpdateEngineBase as before).


** UsingList:

When applied to a whole request, this is a protocol feature. I think it's better to keep it as such. It seems a small feature for API use thet could be handled via the context (it is the same as rewriting each UpdateModify with no set UsingList to have the global one).

The use case is forming a dataset for the WHERE part of an update from a large collection of graphs. The storage engine may have opinions here (e.g. security, or only certain fixed shapes) so the selection process is, to my mind, impl specific.

(but UpdateWithUsing - could do with the UsingList locally internally!)


Alternative proposal:
  have two datasets for update - the one to update and the one to query
  GraphStore has a getDatasetForWHERE() operation.
    We need to track down the use of GraphStore for WHERE
    but only in the general purpose engine.
    Normally it's "this" - can be reset.
  It moves responsibility to GraphStore creation.
  No UsingList in the UpdateExecutionFactory API is needed

In ARQ, you create an appropriately structures GraphStore with the graphs of choice. In Fuseki, it uses DynamicDatasets.dynamicDataset
(c.f. UpdateEngineWorker.processUsing) for building the view dataset.


** DataSinks

I didn't understand what's happening with the sinks, sorry.

An update request can be more than one operations, including more than one INSERT DATA or DELETE DATA so multiple sinks of both kinds per request?

Shouldn't the sinks be per operation?  In the writer for example:

    @Override
    public void visit(UpdateDataInsert update)
    {
        Iter.sendToSink(update.getQuads(),
                        getInsertDataSink());
    }

maybe called for several UpdateDataInsert's

Reply via email to