Andy Seaborne wrote: > (Paolo - what's the state of your parallel loader work?)
Both tdbloader3 [1] and tdbloader4 [2] are (should be?) correct, I've been testing them with datasets in the 500-700 million triples range but I consider them (still) *experimental*. Having someone else giving tdbloader3 a try would be good though. We have been experiencing a few problems with tdbloader4 but I am not sure if it is because of tdbloader4 or because of the Amazon EMR environment we are using. I have some code to convert Freebase dumps in RDF, it's ~600 million triples, I'll use that to gather some numbers. Ideally, comparing tdbloader, tdbloader2, tdbloader3 and tdbloader4 (both in terms of time and costs). Paolo [1] http://svn.apache.org/repos/asf/incubator/jena/Scratch/PC/tdbloader3/trunk/ [2] https://github.com/castagna/tdbloader4
