Binary RDF

Andy Seaborne Thu, 19 Jun 2014 09:07:36 -0700

Lizard needs to do network transfer of RDF data. Rather than just doingsomething specific to Lizard, I've started on a general binary RDFmodule using Apache Thrift.


== RDF-Thrift
Work in Progress :: https://github.com/afs/rdf-thrift/


Discussion welcome.


The current is to have three supported abstractions:

1. StreamRDF
2. SPARQL Result Sets
3. RDF patch (which is very like StreamRDF but with A and D markers).

A first pass for StreamRDF is done including some attempts to reduceobjetc churn when crossing the abstract boundaries. Abstract is all verywell but repeated conversion of datastructures can slow things down.


Using StreamRDF means that prefix compression can be done.

See
  https://github.com/afs/rdf-thrift/blob/master/RDF.thrift
for the encoding at the moment for just RDF.

== In Jena

There are a number of places this might be useful:

1/ Fuseki and "application/sparql-results+thrift", "application/x-thrift"

(oh dear, "application/x-thrift", "x-" is not encouraged any more due tothe transition problem c.f. "application/x-www-form-urlencoded")


2/ Hadoop-RDF

This is currently using N-Triple/N-Quads. Rob - presumably this wouldbe useful eventually. AbstractNodeTupleWritable /AbstractNLineFileInputFormat look about right to be but that's fromcode-reading not code-doing.


(I know you/Cray have some internal binary RDF)

3/ Data bags and spill to disk

4/ RDF patch

5/ TDB (v2 - it would be a disk change) could useful use the RDF termencoding for the node table.

5/ Files. Add to RIOT as a new syntax (a fairly direct access toStreamRDF+Thrift) which then helps TDB loading.


6/ Caching results set in queries in Fuseki.

In an ideal world, the Thrift format could be shared across toolkits.There is nothing Jena specific about the wire encoding.


== Thrift vs Protocol Buffer(+netty)

The Lizard prototype currently uses Protocol Buffer + netty. Doing RDFThrift has a way to learn about Thrift.


All the reviews and comparisons on the interweb seem to be born out.
There isn't a huge difference between the two.

Thrift's initial entry costs are higher (document is still weak, themaven artifact does not have a maven compatible source artifact (!!!) soyou have to mangle one yourself which isn't hard; there is the sourcebut in a non-standard form.

Thrift has it's own networking; I'm unlikely to use the service (RPC)layer from Thrift in Lizard itself as it is not fully streaming butdriving the next layer down directly is quite easy (as it is in PB+N).

Protocol Buffers does not have a network layer, it's just the byteencoding, but Netty comes with built in protocol buffer handling (PB+N).That works fine as well and I have done back and found the equivalentfunctionality I have used in RDF Thrift.

For binary RDF and it's general use, thrift's wider language cover is aplus point.


        Andy

Binary RDF

Reply via email to