Lizard needs to do network transfer of RDF data. Rather than just doing
something specific to Lizard, I've started on a general binary RDF
module using Apache Thrift.
== RDF-Thrift
Work in Progress :: https://github.com/afs/rdf-thrift/
Discussion welcome.
The current is to have three supported abstractions:
1. StreamRDF
2. SPARQL Result Sets
3. RDF patch (which is very like StreamRDF but with A and D markers).
A first pass for StreamRDF is done including some attempts to reduce
objetc churn when crossing the abstract boundaries. Abstract is all very
well but repeated conversion of datastructures can slow things down.
Using StreamRDF means that prefix compression can be done.
See
https://github.com/afs/rdf-thrift/blob/master/RDF.thrift
for the encoding at the moment for just RDF.
== In Jena
There are a number of places this might be useful:
1/ Fuseki and "application/sparql-results+thrift", "application/x-thrift"
(oh dear, "application/x-thrift", "x-" is not encouraged any more due to
the transition problem c.f. "application/x-www-form-urlencoded")
2/ Hadoop-RDF
This is currently using N-Triple/N-Quads. Rob - presumably this would
be useful eventually. AbstractNodeTupleWritable /
AbstractNLineFileInputFormat look about right to be but that's from
code-reading not code-doing.
(I know you/Cray have some internal binary RDF)
3/ Data bags and spill to disk
4/ RDF patch
5/ TDB (v2 - it would be a disk change) could useful use the RDF term
encoding for the node table.
5/ Files. Add to RIOT as a new syntax (a fairly direct access to
StreamRDF+Thrift) which then helps TDB loading.
6/ Caching results set in queries in Fuseki.
In an ideal world, the Thrift format could be shared across toolkits.
There is nothing Jena specific about the wire encoding.
== Thrift vs Protocol Buffer(+netty)
The Lizard prototype currently uses Protocol Buffer + netty. Doing RDF
Thrift has a way to learn about Thrift.
All the reviews and comparisons on the interweb seem to be born out.
There isn't a huge difference between the two.
Thrift's initial entry costs are higher (document is still weak, the
maven artifact does not have a maven compatible source artifact (!!!) so
you have to mangle one yourself which isn't hard; there is the source
but in a non-standard form.
Thrift has it's own networking; I'm unlikely to use the service (RPC)
layer from Thrift in Lizard itself as it is not fully streaming but
driving the next layer down directly is quite easy (as it is in PB+N).
Protocol Buffers does not have a network layer, it's just the byte
encoding, but Netty comes with built in protocol buffer handling (PB+N).
That works fine as well and I have done back and found the equivalent
functionality I have used in RDF Thrift.
For binary RDF and it's general use, thrift's wider language cover is a
plus point.
Andy