Andy Seaborne wrote: > The flow is: > > infer --rdfs=VOCAB DATA | tdbloader2 --loc DB > > on a 64bit system. Linux is faster than Windows. > > (tdbloader2 only runs on linux currently - Paolo has a pure java version > on github)
tdbloader2 (pure Java version) is here (experimental): http://svn.apache.org/repos/asf/incubator/jena/Scratch/PC/tdbloader2/trunk/ If you want to discuss further or help, see JENA-117: https://issues.apache.org/jira/browse/JENA-117 Inference a la RIOT infer command line can be done using MapReduce as well (a map only job), I doubt you can beat that if you have a medium to large Hadoop cluster. ;-) See, for example (... another experimental thing): https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferDriver.java https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferMapper.java Using MapReduce to generate TDB indexes is possible, but not 'easy'. See, for example: https://github.com/castagna/tdbloader3/ I am planning to investigate the route of having hash node ids which would simplify parallel generation of TDB indexes as well as merging existing indexes. Mariano, do you have an Hadoop cluster @ unibz.it? Cheers, Paolo
