Lizard needs to do network transfer of RDF data. Rather than just doing something specific to Lizard, I've started on a general binary RDF module using Apache Thrift.

== RDF-Thrift
Work in Progress :: https://github.com/afs/rdf-thrift/

Discussion welcome.


The current is to have three supported abstractions:

1. StreamRDF
2. SPARQL Result Sets
3. RDF patch (which is very like StreamRDF but with A and D markers).

A first pass for StreamRDF is done including some attempts to reduce objetc churn when crossing the abstract boundaries. Abstract is all very well but repeated conversion of datastructures can slow things down.

Using StreamRDF means that prefix compression can be done.

See
  https://github.com/afs/rdf-thrift/blob/master/RDF.thrift
for the encoding at the moment for just RDF.

== In Jena

There are a number of places this might be useful:

1/ Fuseki and "application/sparql-results+thrift", "application/x-thrift"

(oh dear, "application/x-thrift", "x-" is not encouraged any more due to the transition problem c.f. "application/x-www-form-urlencoded")

2/ Hadoop-RDF

This is currently using N-Triple/N-Quads. Rob - presumably this would be useful eventually. AbstractNodeTupleWritable / AbstractNLineFileInputFormat look about right to be but that's from code-reading not code-doing.

(I know you/Cray have some internal binary RDF)

3/ Data bags and spill to disk

4/ RDF patch

5/ TDB (v2 - it would be a disk change) could useful use the RDF term encoding for the node table.

5/ Files. Add to RIOT as a new syntax (a fairly direct access to StreamRDF+Thrift) which then helps TDB loading.

6/ Caching results set in queries in Fuseki.

In an ideal world, the Thrift format could be shared across toolkits. There is nothing Jena specific about the wire encoding.

== Thrift vs Protocol Buffer(+netty)

The Lizard prototype currently uses Protocol Buffer + netty. Doing RDF Thrift has a way to learn about Thrift.

All the reviews and comparisons on the interweb seem to be born out.
There isn't a huge difference between the two.

Thrift's initial entry costs are higher (document is still weak, the maven artifact does not have a maven compatible source artifact (!!!) so you have to mangle one yourself which isn't hard; there is the source but in a non-standard form.

Thrift has it's own networking; I'm unlikely to use the service (RPC) layer from Thrift in Lizard itself as it is not fully streaming but driving the next layer down directly is quite easy (as it is in PB+N).

Protocol Buffers does not have a network layer, it's just the byte encoding, but Netty comes with built in protocol buffer handling (PB+N). That works fine as well and I have done back and found the equivalent functionality I have used in RDF Thrift.

For binary RDF and it's general use, thrift's wider language cover is a plus point.

        Andy

Reply via email to