On 30/06/14 14:07, Rob Vesse wrote:
Setup and code?
https://github.com/afs/rdf-thrift
(caution - I have swapped the encoding scheme to see if a different one
is better/worse and haven't rerun the timing tests).
There are a couple of scripts rdf2thrift (writes thrift) and thrift2rdf.
In theory, now if you call LangThrift.init() it wires itself into RIOT
but I ran out of time properly testing that.
I don't know what the writing speed is yet. It should be much better
than the string-based N-Triples etc.
Andy
I'd be interested in seeing how the internal binary rdf stuff we have
compares
Rob
On 21/06/2014 22:19, "Andy Seaborne" <[email protected]> wrote:
First pass results for parsing from a file to a null sink, no tuning or
profiling. Jena java level Triple objects and all nodes are created.
RIOT (128K IO buffer)
bsbm-25m.nt.gz : 127,082 Triples per second (TPS)
bsbm-25m.nt: 133,104 TPS
RDF Thrift (32K IO buffer)
bsbm-25m.rt: 357,101 TPS x2.8
bsbm-25m.rt.gz: 390,578 TPS x2.9
RDF Thrift (128K IO buffer)
bsbm-25m.rt: 409,788 TPS x3.2
bsbm-25m.rt.gz: 389,969 TPS x2.9
and best
gzip -d bsbm-25m.rt.gz | thrift2rdf (128K IO buffer)
490,138 TPS
File sizes:
bsbm-25m.nt: 6,505,289,318 bytes (6.1G)
bsbm-25m.nt.gz: 691,429,780 bytes (660M)
bsbm-25m.rt: 6,684,543,995 bytes (6.3G)
bsbm-25m.rt.gz: 700,639,242 bytes (669M)
Andy