Just a follow-up. My success story, if you can call it that, with this
task was on my sixth attempt and the following setting:
CPU : Intel Xeon E5-2640 @ 2.50GHz
RAM : 24 GiB
JVM heap size : 18g out - an additional 4g swap was used in the process
No info on the disk, but likely a SSD
time: 6 hours 30 mins
Once again, I used the same datasets as in
http://dev.iks-project.eu/downloads/stanbol-indices/dbpedia-3.8/ plus
en/mappingbased_properties_en.nt
In mappings.txt I also added all dbpedia-owl entities and removed
mappings from dc:title and foaf:name to rdfs:label
Cheers
Alessandro
On 29/07/2013 12:49, Alessandro Adamou wrote:
Hi,
I've been trying to build a new custom Solr index DBPedia 3.8, because
it seems the one uploaded on
http://dev.iks-project.eu/downloads/stanbol-indices/dbpedia-3.8/
does not index the DBPedia ontology properties as in the
en/mappingbased_properties_en.nt dump
So I:
* initialized the EntityHub indexing tool
* Downloaded 1.2 GB worth of bzipped dumps to indexing/resources/rdfdata
* Added http://dbpedia.org/ontology/* to indexing/config/mappings.txt
* Executed the indexing tool
It has been a very intensive process so far, and I had to restart it
four times due to resource issues.
On a 8 GiB RAM Mac running on a hard-disk it was taking like ten
minutes every 10k indexed items, i.e. after 36 hours it was still
quite a way to go. Plus, the system started thrashing out of page
faults even with a 6GiB Java heap.
Tried again on an external SSD and a 7GiB heap. It went through all
the triples in about 8 hours, but hit several OutOfMemoryError on the
org.apache.lucene.index.IndexWriter.forceMerge
So I'm asking: who has managed to build an entire DBpedia index so
far, and on what hardware specs (especially heap size)?
Thanks
Alessandro
--
Alessandro Adamou, Ph.D.
Knowledge Media Institute
The Open University
Walton Hall, Milton Keynes MK7 6AA
United Kingdom
"I will give you everything, just don't demand anything."
(Ettore Petrolini, 1917)
Not sent from my iSnobTechDevice