Hi Paolo, On Dec 5, 2011, at 11:26 AM, Paolo Castagna wrote:
> Andy Seaborne wrote: >> The flow is: >> >> infer --rdfs=VOCAB DATA | tdbloader2 --loc DB >> >> on a 64bit system. Linux is faster than Windows. >> >> (tdbloader2 only runs on linux currently - Paolo has a pure java version >> on github) > > tdbloader2 (pure Java version) is here (experimental): > http://svn.apache.org/repos/asf/incubator/jena/Scratch/PC/tdbloader2/trunk/ > > If you want to discuss further or help, see JENA-117: > https://issues.apache.org/jira/browse/JENA-117 Excellent, we'll start reading the docs asap > > Inference a la RIOT infer command line can be done using MapReduce as well > (a map only job), I doubt you can beat that if you have a medium to large > Hadoop cluster. ;-) In this case of this first initial round of benchmarks we want to avoid any Hadoop or map-reduce approaches. The reason is that we want to have raw numbers of the core reasoning techniques, in this case forward chaining vs. backward chaining and our technique called semantic indexes which is a bit like backward chaining but with a tiny bit of extra work at loading time. We want to avoid evaluating benefits from the architecture of the system (map-reduce for example) because the technique that we are testing can also be extended with map-reduce and a parallel architecture. We do want to test and move to the Hadoop map-reduce setting in the (mid-term) future, but first we can to have the simple setting as optimal as possible. By the way Paolo, does tdbloader2 has anything to do with Sesame's RIO? > Mariano, do you have an Hadoop cluster @ unibz.it? That another reason not to do the map-reduce part yet :) we also don't have yet a cluster at bolzano :( > > Cheers, > Paolo >
