Hi all,

for those of you who want to run the DBpedia extraction locally, here
are some tips on how to import a Wikipedia dump:

1. configure your mysql server!
thats very very important. if you have enough RAM, use it. Find my mysql
config below.

2. have two hard-disks.
one for the mysql data, one for the wikipedia dumps.

3. use a standalone machine.
the Wikipedia import puts a lot of load on harddisk and cpu. I used to
use one of our application servers, which already had some load on it,
and the import took weeks.

4. defrag your harddisks.
can save some time.

5. configure the dbpedia import script.
if your're not running a server OS, remove the "-server" flag in the
java mwdumper call. (ok, this is not a performance tip, just a note for
getting it working)


On my workstation (intel quadcore 2,66 ghz, 8gb ram, vista 64bit, two
10k harddisks), the Wikipedia import took around a very decent 6 hours.


Cheers,
Georgi


mysql config:
key_buffer = 1024M
max_allowed_packet = 32M
table_cache = 256
sort_buffer_size = 512M
net_buffer_length = 8M
read_buffer_size = 64M
read_rnd_buffer_size = 64M
myisam_sort_buffer_size = 512M
 

--
Georgi Kobilarov
Freie Universität Berlin
www.georgikobilarov.com


-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to