Re: [Dbpedia-gsoc] An Exception when running the extraction

Jona Christopher Sahnwaldt Mon, 17 Mar 2014 04:29:29 -0700

The exception occurs because ImageExtractor needs the commons dump, but you
didn't download it. The following should be in your download.properties:


# Only needed for the ImageExtractor
download=commons:pages-articles.xml.bz2

(Copied from download.10000.properties.)

Just run the download again, it won't download the other files again if
they are up to date.

HTH

JC
On Mar 15, 2014 11:45 PM, "wencan luo" <[email protected]> wrote:

> I have successfully compiled the extraction-framework and run the download
> for the English Wikipedia.
>
> However, when I run the extraction, I have the following error:
> ################################################################
> ....
> Caused by: java.io.IOException: failed to list files in
> [E:\project\gsoc2014\wik
> ipedia\commonswiki]
>         at org.dbpedia.extraction.util.RichFile.names(RichFile.scala:44)
>         at org.dbpedia.extraction.util.RichFile.names(RichFile.scala:39)
>         at org.dbpedia.extraction.util.Finder.dates(Finder.scala:52)
>         at
> org.dbpedia.extraction.dump.extract.ConfigLoader.latestDate(ConfigLoa
> der.scala:196)
> ....
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 03:46 h
> [INFO] Finished at: 2014-03-15T06:31:41-05:00
> [INFO] Final Memory: 10M/231M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal
> net.alchim31.maven:scala-maven-plugin:3.1.6:run (
> default-cli) on project dump: wrap:
> org.apache.commons.exec.ExecuteException: Pr
> ocess exited with an error: -10000 (Exit value: -10000) -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e swit
> ch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please rea
> d the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
> xception
> ###########################################################
>
> After it, there is only one output file under the dataset folder:
> enwiki-20140304-template-redirects.obj
>
>
> In addition, I used the following config parameters for the extraction:
> base-dir=E:/project/gsoc2014/wikipedia
> source=pages-articles.xml.bz2
> languages=en
>
>
> extractors.en=.MappingExtractor,.DisambiguationExtractor,.HomepageExtractor,.ImageExtractor,\
>
> .PersondataExtractor,.PndExtractor,.TopicalConceptsExtractor,.FlickrWrapprLinkExtractor
>
> Here are my questions:
> 1. Does different languages have different extractors?
> 2. Is the default source parameter "pages-articles.xml.bz2"? When I didn't
> include this line, I will have an exception saying **pages-articles.xml not
> found.
> 3. How many hours does it take to run the extractor for only the english
> and for all the languages?
> 4. How many disk space do I need to store all the data?
> 5. How can I debug an extractor? Testing on the whole Wikipedia dump is
> impossible when debugging. It is too slow.
>
>
> --
> Wencan Luo
> CS Department- Univ. of Pittsburgh
> 210 S. Bouquet Street
> 6501 Sennott Square
> Pittsburgh, PA 15260
> E-mail: [email protected] or [email protected]
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Dbpedia-gsoc mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
>
>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech

_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Re: [Dbpedia-gsoc] An Exception when running the extraction

Reply via email to