Hi David, Pablo is right - if you only download a few files, wget is great. :-)
The old downloader was broken. I recently rewrote it, but didn't integrate it with the extraction code yet (I'm not even sure that's a good idea), so it's a separate step. Try using mvn scala:run download in the directory extraction_framework/dump. The configuration is in download.properties or directly in the pom.xml. These settings should work for you. (I hope the line breaks survive intact...) # NOTE: format is not java.util.Properties, but # org.dbpedia.extraction.dump.download.Config dir=K:/Work/Eclipse Workspace/DBpedia_Dumps/to_update base=http://dumps.wikimedia.org/ dump=commons,en:pages-articles.xml.bz2 unzip=true retry-max=5 retry-millis=10000 #the following is only needed when you download #wikipedia language editions by their article count #csv=http://s23.org/wikistats/wikipedias_csv #the following are only needed if want to run the #AbstractExtractor, which uses a local MediaWiki #installation and takes several days to run. #dump=en:image.sql.gz,imagelinks.sql.gz,langlinks.sql.gz,templatelinks.sql.gz,categorylinks.sql.gz #other=http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/maintenance/tables.sql Cheers, Christopher On Thu, Mar 29, 2012 at 17:42, Pablo Mendes <[email protected]> wrote: > Hi David, > What about downloading with wget? > > Cheers, > Pablo > > > On Thu, Mar 29, 2012 at 5:33 PM, David Gösenbauer > <[email protected]> wrote: >> >> Hi dbpedia-community! >> >> I'm experiencing heavy problems trying to get the extraction framework >> to run. The step I'm stuck at is downloading the dumps. My config-file >> seems to be correct as the download is started by the framework when >> running "mvn scala:run". Nevertheless the download times-out at a random >> state of data downloaded. >> >> Downloading this file >> >> http://dumps.wikimedia.org/enwiki/20120307/enwiki-20120307-pages-articles.xml.bz2 >> with my browser is 10x slower than by downloading it with the framework. >> Downloading it with the browser results in the supposedly completely >> downloaded archive which is corrupted everytime since the download times >> out or else (The browser shows the download as completed though). >> >> At the moment it's impossible for me to get the dumps. I hope someone >> can please help me out since I need the most recent data at hand! >> >> Regards, >> David >> >> My config-file: >> >> dumpDir=K:/Work/Eclipse Workspace/DBpedia_Dumps/to_update >> outputDir=K:/Work/Eclipse Workspace/DBpedia_Dumps/updated >> updateDumps=true >> >> extractors=org.dbpedia.extraction.mappings.LabelExtractor \ >> org.dbpedia.extraction.mappings.WikiPageExtractor \ >> org.dbpedia.extraction.mappings.InfoboxExtractor \ >> org.dbpedia.extraction.mappings.PageLinksExtractor \ >> org.dbpedia.extraction.mappings.GeoExtractor >> >> extractors.en=org.dbpedia.extraction.mappings.CategoryLabelExtractor \ >> org.dbpedia.extraction.mappings.ArticleCategoriesExtractor \ >> org.dbpedia.extraction.mappings.ExternalLinksExtractor \ >> org.dbpedia.extraction.mappings.HomepageExtractor \ >> org.dbpedia.extraction.mappings.DisambiguationExtractor \ >> org.dbpedia.extraction.mappings.PersondataExtractor \ >> org.dbpedia.extraction.mappings.PndExtractor \ >> org.dbpedia.extraction.mappings.SkosCategoriesExtractor \ >> org.dbpedia.extraction.mappings.RedirectExtractor \ >> org.dbpedia.extraction.mappings.MappingExtractor \ >> org.dbpedia.extraction.mappings.PageIdExtractor \ >> org.dbpedia.extraction.mappings.AbstractExtractor \ >> org.dbpedia.extraction.mappings.RevisionIdExtractor >> >> languages=en >> >> >> >> >> ------------------------------------------------------------------------------ >> This SF email is sponsosred by: >> Try Windows Azure free for 90 days Click Here >> http://p.sf.net/sfu/sfd2d-msazure >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
