Hi David,

Pablo is right - if you only download a few files, wget is great. :-)

The old downloader was broken. I recently rewrote it, but didn't
integrate it with the extraction code yet (I'm not even sure that's a
good idea), so it's a separate step. Try using

mvn scala:run download

in the directory extraction_framework/dump.

The configuration is in download.properties or directly in the
pom.xml. These settings should work for you. (I hope the line breaks
survive intact...)

# NOTE: format is not java.util.Properties, but
# org.dbpedia.extraction.dump.download.Config
dir=K:/Work/Eclipse Workspace/DBpedia_Dumps/to_update
base=http://dumps.wikimedia.org/
dump=commons,en:pages-articles.xml.bz2
unzip=true
retry-max=5
retry-millis=10000
#the following is only needed when you download
#wikipedia language editions by their article count
#csv=http://s23.org/wikistats/wikipedias_csv
#the following are only needed if want to run the
#AbstractExtractor, which uses a local MediaWiki
#installation and takes several days to run.
#dump=en:image.sql.gz,imagelinks.sql.gz,langlinks.sql.gz,templatelinks.sql.gz,categorylinks.sql.gz
#other=http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/maintenance/tables.sql

Cheers,
Christopher

On Thu, Mar 29, 2012 at 17:42, Pablo Mendes <[email protected]> wrote:
> Hi David,
> What about downloading with wget?
>
> Cheers,
> Pablo
>
>
> On Thu, Mar 29, 2012 at 5:33 PM, David Gösenbauer
> <[email protected]> wrote:
>>
>> Hi dbpedia-community!
>>
>> I'm experiencing heavy problems trying to get the extraction framework
>> to run. The step I'm stuck at is downloading the dumps. My config-file
>> seems to be correct as the download is started by the framework when
>> running "mvn scala:run". Nevertheless the download times-out at a random
>> state of data downloaded.
>>
>> Downloading this file
>>
>> http://dumps.wikimedia.org/enwiki/20120307/enwiki-20120307-pages-articles.xml.bz2
>> with my browser is 10x slower than by downloading it with the framework.
>> Downloading it with the browser results in the supposedly completely
>> downloaded archive which is corrupted everytime since the download times
>> out or else (The browser shows the download as completed though).
>>
>> At the moment it's impossible for me to get the dumps. I hope someone
>> can please help me out since I need the most recent data at hand!
>>
>> Regards,
>> David
>>
>> My config-file:
>>
>> dumpDir=K:/Work/Eclipse Workspace/DBpedia_Dumps/to_update
>> outputDir=K:/Work/Eclipse Workspace/DBpedia_Dumps/updated
>> updateDumps=true
>>
>> extractors=org.dbpedia.extraction.mappings.LabelExtractor \
>>            org.dbpedia.extraction.mappings.WikiPageExtractor \
>>            org.dbpedia.extraction.mappings.InfoboxExtractor \
>>            org.dbpedia.extraction.mappings.PageLinksExtractor \
>>            org.dbpedia.extraction.mappings.GeoExtractor
>>
>> extractors.en=org.dbpedia.extraction.mappings.CategoryLabelExtractor \
>>               org.dbpedia.extraction.mappings.ArticleCategoriesExtractor \
>>               org.dbpedia.extraction.mappings.ExternalLinksExtractor \
>>               org.dbpedia.extraction.mappings.HomepageExtractor \
>>               org.dbpedia.extraction.mappings.DisambiguationExtractor \
>>               org.dbpedia.extraction.mappings.PersondataExtractor \
>>               org.dbpedia.extraction.mappings.PndExtractor \
>>               org.dbpedia.extraction.mappings.SkosCategoriesExtractor \
>>               org.dbpedia.extraction.mappings.RedirectExtractor \
>>               org.dbpedia.extraction.mappings.MappingExtractor \
>>               org.dbpedia.extraction.mappings.PageIdExtractor \
>>               org.dbpedia.extraction.mappings.AbstractExtractor \
>>               org.dbpedia.extraction.mappings.RevisionIdExtractor
>>
>> languages=en
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> This SF email is sponsosred by:
>> Try Windows Azure free for 90 days Click Here
>> http://p.sf.net/sfu/sfd2d-msazure
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to