On 31/08/14 19:03, Stian Soiland-Reyes wrote:
How have you tested this for IRIs and international characters in literals?
sorry, I am out travelling and have not checked the code yet.. :)
Yes.
Thrift encodes strings as UTF-8.
The wire form of an IRI is a tagged string:
http://afs.github.io/rdf-thrift/rdf-binary-thrift.html
struct RDF_IRI {
1: required string iri
}
The new dependency on Apache Thrift would be my main concern if this is not
in a separate module. How stable are Thrift APIs?E.g. do they follow
semantic versioning so that a Jena build will work with a newer Thrift
version (with same major)?
Stronger than that - Thrift cares a lot about wire/storage format
compatibility because of the large scale of deployments in which it's
used.
A system wide, cross-language change of format simply isn't practical.
It would have to be a parallel evolution.
See their discussion of adding the union type - on the wire its a struct
of one element (i.e. each element is 'optional') and union-ness is
provided by the encode/decode. Old implementations that are not aware
of union still work.
What is open (but closing) is whether the RDF encoding is the right one.
Evidence from real use is always going to be valuable.
Andy
On 31 Aug 2014 15:37, "Andy Seaborne" <[email protected]> wrote:
On 26/08/14 21:20, Andy Seaborne wrote:
I've been working on a binary format for RDF and SPARQL result sets:
http://afs.github.io/rdf-thrift/
This is now ready to go if everyone is OK with that.
I'm flagging this up for passive consensus because it adds a new
dependency (for Apache Thrift).
And of course any questions or comments.
Summary, as an RDF syntax:
+ x3 faster to parse than N-triples
+ same size as N-triples, and same compression effects with gzip (8-10
compression).
+ Not much additional work to add because Thrift does most of the work.
Andy
Migration done (JENA-774). Some cleaning up to do (putting classes in
more logical places mostly) but tests in and passing.
Andy