Re: Fair Benchmarking of SDB, TDB and LUBM 100 > with inference support and limited memory

Paolo Castagna Mon, 05 Dec 2011 02:27:01 -0800

Andy Seaborne wrote:
> The flow is:
> 
> infer --rdfs=VOCAB DATA | tdbloader2 --loc DB
> 
> on a 64bit system.  Linux is faster than Windows.
> 
> (tdbloader2 only runs on linux currently - Paolo has a pure java version
> on github)


tdbloader2 (pure Java version) is here (experimental):
http://svn.apache.org/repos/asf/incubator/jena/Scratch/PC/tdbloader2/trunk/

If you want to discuss further or help, see JENA-117:
https://issues.apache.org/jira/browse/JENA-117

Inference a la RIOT infer command line can be done using MapReduce as well
(a map only job), I doubt you can beat that if you have a medium to large
Hadoop cluster. ;-)

See, for example (... another experimental thing):
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferDriver.java
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferMapper.java

Using MapReduce to generate TDB indexes is possible, but not 'easy'.
See, for example: https://github.com/castagna/tdbloader3/

I am planning to investigate the route of having hash node ids which
would simplify parallel generation of TDB indexes as well as merging
existing indexes.

Mariano, do you have an Hadoop cluster @ unibz.it?

Cheers,
Paolo

Re: Fair Benchmarking of SDB, TDB and LUBM 100 > with inference support and limited memory

Reply via email to