Re: Fair Benchmarking of SDB, TDB and LUBM 100 > with inference support and limited memory

Mariano Rodriguez Mon, 05 Dec 2011 05:59:07 -0800

Hi Paolo, 

On Dec 5, 2011, at 11:26 AM, Paolo Castagna wrote:

> Andy Seaborne wrote:
>> The flow is:
>> 
>> infer --rdfs=VOCAB DATA | tdbloader2 --loc DB
>> 
>> on a 64bit system.  Linux is faster than Windows.
>> 
>> (tdbloader2 only runs on linux currently - Paolo has a pure java version
>> on github)
> 
> tdbloader2 (pure Java version) is here (experimental):
> http://svn.apache.org/repos/asf/incubator/jena/Scratch/PC/tdbloader2/trunk/
> 
> If you want to discuss further or help, see JENA-117:
> https://issues.apache.org/jira/browse/JENA-117

Excellent, we'll start reading the docs asap

> 
> Inference a la RIOT infer command line can be done using MapReduce as well
> (a map only job), I doubt you can beat that if you have a medium to large
> Hadoop cluster. ;-)

In this case of this first initial round of benchmarks we want to avoid any 
Hadoop or 
map-reduce approaches. The reason is that
we want to have raw numbers of the core reasoning techniques, in this case 
forward chaining
vs. backward chaining and our technique called semantic indexes which is a bit 
like backward
chaining but with a tiny bit of extra work at loading time. We want to avoid 
evaluating
benefits from the architecture of the system (map-reduce for example) because 
the technique that we are
testing can also be extended with map-reduce and a parallel architecture. 

We do want to test and move to the Hadoop map-reduce setting in the (mid-term) 
future, but first
we can to have the simple setting as optimal as possible.

By the way Paolo, does tdbloader2 has anything to do with Sesame's RIO? 

> Mariano, do you have an Hadoop cluster @ unibz.it?

That another reason not to do the map-reduce part yet :) we also don't have yet 
a cluster at bolzano :( 

> 
> Cheers,
> Paolo
>

Re: Fair Benchmarking of SDB, TDB and LUBM 100 > with inference support and limited memory

Reply via email to