Hi,

I've been trying to build a new custom Solr index DBPedia 3.8, because it seems the one uploaded on

http://dev.iks-project.eu/downloads/stanbol-indices/dbpedia-3.8/

does not index the DBPedia ontology properties as in the en/mappingbased_properties_en.nt dump

So I:
* initialized the EntityHub indexing tool
* Downloaded 1.2 GB worth of bzipped dumps to indexing/resources/rdfdata
* Added http://dbpedia.org/ontology/* to indexing/config/mappings.txt
* Executed the indexing tool

It has been a very intensive process so far, and I had to restart it four times due to resource issues.

On a 8 GiB RAM Mac running on a hard-disk it was taking like ten minutes every 10k indexed items, i.e. after 36 hours it was still quite a way to go. Plus, the system started thrashing out of page faults even with a 6GiB Java heap.

Tried again on an external SSD and a 7GiB heap. It went through all the triples in about 8 hours, but hit several OutOfMemoryError on the org.apache.lucene.index.IndexWriter.forceMerge

So I'm asking: who has managed to build an entire DBpedia index so far, and on what hardware specs (especially heap size)?

Thanks

Alessandro


--
Alessandro Adamou, Ph.D.

Knowledge Media Institute
The Open University
Walton Hall, Milton Keynes MK7 6AA
United Kingdom


"I will give you everything, just don't demand anything."
(Ettore Petrolini, 1917)

Not sent from my iSnobTechDevice

Reply via email to