[Dbpedia-discussion] extraction problem

gaurav pant Mon, 04 Mar 2013 21:12:04 -0800

Hi All,

Greeting for the day..


I want to extract infobox properties and abstract from
(pages-articles.xml.bz2).I am able to download this file using command
"../run download config=download.de.properties"

here I have configured file download.de.properties.file to download only
german page-article file.

Now when i am trying to extract information out from it using "../run
extraction extraction.de.property" it is giving me below error. In
*extraction.de.property
*I have mentioned dir properly , the same which I have mentioned in
download.de.properties file.

Please let me know what wrong is going on?Is there any change need to be
done in pom.xml of cump dir.

"
[INFO] --- maven-scala-plugin:2.15.2:testCompile (test-compile) @ dump ---
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO]
[INFO] <<< maven-scala-plugin:2.15.2:run (default-cli) @ dump <<<
[INFO]
[INFO] --- maven-scala-plugin:2.15.2:run (default-cli) @ dump ---
[INFO] Checking for multiple versions of scala
[INFO] launcher 'extraction' selected =>
org.dbpedia.extraction.dump.extract.Extraction
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at
org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
    at
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
*Caused by: java.lang.IllegalArgumentException: property 'base-dir' not
defined.*
    at
org.dbpedia.extraction.dump.extract.ConfigParser.error(ConfigParser.scala:18)
    at org.dbpedia.extraction.dump.extract.Config.<init>(Config.scala:26)
    at
org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:26)
    at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
    ... 6 more
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 3.356s
[INFO] Finished at: Tue Mar 05 04:52:35 UTC 2013
[INFO] Final Memory: 8M/140M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.scala-tools:maven-scala-plugin:2.15.2:run (default-cli) on project
dump: wrap: org.apache.commons.exec.ExecuteException: Process exited with
an error: 240(Exit value: 240) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
"
*
contents of extraction.de.property*

"# download and extraction target dir
dir=/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump

# Source file. If source file name ends with .gz or .bz2, it is unzipped on
the fly.
# Must exist in the directory xxwiki/20121231 and have the prefix
xxwiki-20121231-.

# default:
# source=pages-articles.xml

# alternatives:
source=pages-articles.xml.bz2
# source=pages-articles.xml.gz

# use only directories that contain a 'download-complete' file? Default is
false.
require-download-complete=true

# unqualified extractor class names are prefixed by
org.dbpedia.extraction.mappings.

# All 111 languages that as of 2012-05-25 have 10000 articles or more.
# TODO: parse wikipedias.csv and figure out from there which languages to
extract.
# If no languages are given, the ones having a mapping namespace on
mappings.dbpedia.org are used
languages=de

extractors=InfoboxExtractor
#ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
#GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
#RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor

extractors.de=InfoboxExtractor
#extractors.de
=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
#extractors.en=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor

# if ontology and mapping files are not given or do not exist, download
info from mappings.dbpedia.org
ontology=../ontology.xml
mappings=../mappings

# URI policies. Allowed flags: uri, generic, xml-safe. Each flag may have
on of the suffixes
# -subjects, -predicates, -objects, -datatype, -context to match only URIs
in a certain position.
# Without a suffix, a flag matches all URI positions.

uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
uri-policy.iri=generic:en; xml-safe-predicates:*


# File formats. Allowed flags: n-triples, n-quads, turtle-triples,
turtle-quads, trix-triples, trix-quads
# May be followed by a semicolon and a URI policy name. If format name ends
with .gz or .bz2, files
# are zipped on the fly.

# NT is unreadable anyway - might as well use URIs
format.nt=n-triples;uri-policy.uri
#format.nq.gz=n-quads;uri-policy.uri

# Turtle is much more readable - use nice IRIs
format.ttl=turtle-triples;uri-policy.iri
#format.tql.gz=turtle-quads;uri-policy.iri
"

-- 
Regards
Gaurav Pant
+91-7709196607,+91-9405757794

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] extraction problem

Reply via email to