[RDF Patch] Looking at Talis Changesets and other proposals (v2)

Andy Seaborne Sun, 11 Aug 2013 10:39:52 -0700

Revised notes on Talis Changesets:


==== RDF Patch compared to Talis Changesets.

Talis Changesets (TCS) are defined by:

http://docs.api.talis.com/getting-started/changesets
http://docs.api.talis.com/getting-started/changeset-protocol
http://vocab.org/changeset/schema.html

== Brief Description

A Changeset is a set of triples to remove and a set of triples to add,recorded as a single RDF graph. There is a fixed "subject of change" -a changeset is a change to a single resource. The triples of the changemust all have the same subject and this must be the subject of change.

The triples of the change are recorded as reified statements. This isnecessary so that triples can be grouped into removal and addition sets.The change set can have descriptive information about the change.Because the changset is an RDF graph, the graph can say who was thecreator, record the reason for the change, and the date the modificationwas created (not executed). This also requires that the change triplesare reified.

ChangeSet can be linked together to produce a sequence of changes. Thisis how to get changes to several resources - a list of changesets.


== Pros and Cons

This approach a some advantages and some disadvantages:

(some of these can be overcome by fairly obvious changes to thedefinitions).


1/ Changes relate only to one resource.

You can't make a coordinated set of changes, such as adding a batch ofseveral new resources in a single HTTP request.


2/ Blank nodes can't be handled.

There is no way give the subject of change if it is a blank node nor tolink to existing blank nodes in the data. (The Talis platform didn'tsupport blank nodes.)


3/ Scale issues : streaming and latency.

Streaming is valuable at a scale because changes can start to be appliedas the data arrive, rather than buffering the changes until the end ofchange is seen, then applying the changes.

For large changes, this also impacts latency. Doing some or all of thechanges as data arrives gets an overlap in processing between sender andreceiver.


However, the use of a RDF graph as a changeset blocks streaming.

It needs the whole changeset graph to be available before any changesare made. The whole graph is needed to validate the changeset (e.g. allreified triples have the same subject), and order of triples in aserialization of a graph is arbitrary (esp. if produced by a genericRDF serializer) so, for example, the "subject of change" triple could belast, of the additions and removals can be mixed in any order. To getstable changes, it is necessary to have a rule like all removals donebefore the additions are done.

This is a limitation at scale. In practice, a changeset must be parsedinto memory (standard parser), validated (changeset specific code) andapplied (changeset specific code). The design can't support streamingnor changes which may be larger than available RAM (e.g. millions oftriples).

It does mean that a standard RDF tool kit can be used to produce thechange set (with suitable application code to build the graph structure)and to parse it at the receiver, together with some application code forproducing, validating and executing a changeset.


4/ The feature of metadata per change is a useful feature.

5/ Change sets only work for a change to a resource in a single graph.

== Other

Graph literals:

Some other proposals have been made (like Delta, or variants based onTriG) where named graphs are used instead of reified triples. Thescaling issue remains - processing can't start until the whole changehas been seen.


Delta:

{ ?x bank:accountNo "1234578"}
  diff:deletion { ?x bank:balance 4000};
  diff:insertion { ?x bank:balance 3575}
}

Restricted SPARQL Update:

That leaves a restricted SPARQL update: e.g. DELETE DATA, INSERT DATA,and maybe DELETE WHERE.

As soon as additional restrictions apply, then, to validate, you need aspecial parser so the primary advantage of reusing of existing tools) isonly partial.

[RDF Patch] Looking at Talis Changesets and other proposals (v2)

Reply via email to