Sorry, that should have been mvn scala:run -Dlauncher=download
On Fri, Mar 30, 2012 at 00:24, Jona Christopher Sahnwaldt <[email protected]> wrote: > Hi David, > > Pablo is right - if you only download a few files, wget is great. :-) > > The old downloader was broken. I recently rewrote it, but didn't > integrate it with the extraction code yet (I'm not even sure that's a > good idea), so it's a separate step. Try using > > mvn scala:run download > > in the directory extraction_framework/dump. > > The configuration is in download.properties or directly in the > pom.xml. These settings should work for you. (I hope the line breaks > survive intact...) > > # NOTE: format is not java.util.Properties, but > # org.dbpedia.extraction.dump.download.Config > dir=K:/Work/Eclipse Workspace/DBpedia_Dumps/to_update > base=http://dumps.wikimedia.org/ > dump=commons,en:pages-articles.xml.bz2 > unzip=true > retry-max=5 > retry-millis=10000 > #the following is only needed when you download > #wikipedia language editions by their article count > #csv=http://s23.org/wikistats/wikipedias_csv > #the following are only needed if want to run the > #AbstractExtractor, which uses a local MediaWiki > #installation and takes several days to run. > #dump=en:image.sql.gz,imagelinks.sql.gz,langlinks.sql.gz,templatelinks.sql.gz,categorylinks.sql.gz > #other=http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/maintenance/tables.sql > > Cheers, > Christopher > > On Thu, Mar 29, 2012 at 17:42, Pablo Mendes <[email protected]> wrote: >> Hi David, >> What about downloading with wget? >> >> Cheers, >> Pablo >> >> >> On Thu, Mar 29, 2012 at 5:33 PM, David Gösenbauer >> <[email protected]> wrote: >>> >>> Hi dbpedia-community! >>> >>> I'm experiencing heavy problems trying to get the extraction framework >>> to run. The step I'm stuck at is downloading the dumps. My config-file >>> seems to be correct as the download is started by the framework when >>> running "mvn scala:run". Nevertheless the download times-out at a random >>> state of data downloaded. >>> >>> Downloading this file >>> >>> http://dumps.wikimedia.org/enwiki/20120307/enwiki-20120307-pages-articles.xml.bz2 >>> with my browser is 10x slower than by downloading it with the framework. >>> Downloading it with the browser results in the supposedly completely >>> downloaded archive which is corrupted everytime since the download times >>> out or else (The browser shows the download as completed though). >>> >>> At the moment it's impossible for me to get the dumps. I hope someone >>> can please help me out since I need the most recent data at hand! >>> >>> Regards, >>> David >>> >>> My config-file: >>> >>> dumpDir=K:/Work/Eclipse Workspace/DBpedia_Dumps/to_update >>> outputDir=K:/Work/Eclipse Workspace/DBpedia_Dumps/updated >>> updateDumps=true >>> >>> extractors=org.dbpedia.extraction.mappings.LabelExtractor \ >>> org.dbpedia.extraction.mappings.WikiPageExtractor \ >>> org.dbpedia.extraction.mappings.InfoboxExtractor \ >>> org.dbpedia.extraction.mappings.PageLinksExtractor \ >>> org.dbpedia.extraction.mappings.GeoExtractor >>> >>> extractors.en=org.dbpedia.extraction.mappings.CategoryLabelExtractor \ >>> org.dbpedia.extraction.mappings.ArticleCategoriesExtractor \ >>> org.dbpedia.extraction.mappings.ExternalLinksExtractor \ >>> org.dbpedia.extraction.mappings.HomepageExtractor \ >>> org.dbpedia.extraction.mappings.DisambiguationExtractor \ >>> org.dbpedia.extraction.mappings.PersondataExtractor \ >>> org.dbpedia.extraction.mappings.PndExtractor \ >>> org.dbpedia.extraction.mappings.SkosCategoriesExtractor \ >>> org.dbpedia.extraction.mappings.RedirectExtractor \ >>> org.dbpedia.extraction.mappings.MappingExtractor \ >>> org.dbpedia.extraction.mappings.PageIdExtractor \ >>> org.dbpedia.extraction.mappings.AbstractExtractor \ >>> org.dbpedia.extraction.mappings.RevisionIdExtractor >>> >>> languages=en >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> This SF email is sponsosred by: >>> Try Windows Azure free for 90 days Click Here >>> http://p.sf.net/sfu/sfd2d-msazure >>> _______________________________________________ >>> Dbpedia-discussion mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> >> >> >> ------------------------------------------------------------------------------ >> This SF email is sponsosred by: >> Try Windows Azure free for 90 days Click Here >> http://p.sf.net/sfu/sfd2d-msazure >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion >> ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
