Hello,
I'm interested in creating a Dutch port of DBpedia Spotlight. In order to do
this, I need a disambiguation data set for Dutch. This data set is currently
not available for download. However, based on some messages posted here [1],
I suspect that the latest version of the extraction framework supports this.
Is this correct?
I already imported the extraction framework and it builds successfully (I
only included the core, dump and scripts modules as [2] states that the
other modules are not necessary for running the extraction.). The messages
posted at [3] indicate that it is only necessary to run download and
extract. However, when executing download (using the command ../run download
config=download.properties), the following message is displayed:
[INFO] Scanning for projects...
[INFO]
------------------------------------------------------------------------
[INFO] Building DBpedia Dump Extraction
[INFO] task-segment: [scala:run]
[INFO]
------------------------------------------------------------------------
[INFO] Preparing scala:run
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/main/
resources
[INFO] [scala:compile {execution: process-resources}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Nothing to compile - all classes are up to date
[INFO] [scala:compile {execution: compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/test/
resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [scala:testCompile {execution: test-compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO] [scala:run {execution: default-cli}]
[INFO] Checking for multiple versions of scala
[INFO] launcher 'download' selected =>
org.dbpedia.extraction.dump.download.Download
done: 0 -
todo: 1 - wiki=nl,locale=nl
downloading 'http://dumps.wikimedia.org/nlwiki/' to
'/home/mmlab/wikipedia/nlwiki/index.html'
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
at
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.
java:26)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at scala.runtime.BoxesRunTime.unboxToLong(Unknown Source)
at
org.dbpedia.extraction.dump.download.Counter$.getContentLength(Counter.scala
:38)
at
org.dbpedia.extraction.dump.download.Counter$class.inputStream(Counter.scala
:22)
at
org.dbpedia.extraction.dump.download.Download$Downloader$1.inputStream(Downl
oad.scala:29)
at
org.dbpedia.extraction.dump.download.FileDownloader$class.downloadFile(FileD
ownloader.scala:49)
at
org.dbpedia.extraction.dump.download.Download$Downloader$1.org$dbpedia$extra
ction$dump$download$LastModified$$super$downloadFile(Download.scala:29)
at
org.dbpedia.extraction.dump.download.LastModified$class.downloadFile(LastMod
ified.scala:21)
at
org.dbpedia.extraction.dump.download.Download$Downloader$1.downloadFile(Down
load.scala:29)
at
org.dbpedia.extraction.dump.download.FileDownloader$class.downloadFile(FileD
ownloader.scala:36)
at
org.dbpedia.extraction.dump.download.Download$Downloader$1.org$dbpedia$extra
ction$dump$download$Retry$$super$downloadFile(Download.scala:29)
at
org.dbpedia.extraction.dump.download.Retry$class.downloadFile(Retry.scala:28
)
at
org.dbpedia.extraction.dump.download.Download$Downloader$1.downloadFile(Down
load.scala:29)
at
org.dbpedia.extraction.dump.download.FileDownloader$class.downloadTo(FileDow
nloader.scala:26)
at
org.dbpedia.extraction.dump.download.Download$Downloader$1.downloadTo(Downlo
ad.scala:29)
at
org.dbpedia.extraction.dump.download.LanguageDownloader.downloadDates(Langua
geDownloader.scala:35)
at
org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download
.scala:67)
at
org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download
.scala:62)
at
scala.collection.immutable.TreeSet$$anonfun$foreach$1.apply(TreeSet.scala:11
4)
at
scala.collection.immutable.TreeSet$$anonfun$foreach$1.apply(TreeSet.scala:11
4)
at
scala.collection.immutable.RedBlack$NonEmpty.foreach(RedBlack.scala:164)
at
scala.collection.immutable.TreeSet.foreach(TreeSet.scala:114)
at
org.dbpedia.extraction.dump.download.Download$.main(Download.scala:62)
at
org.dbpedia.extraction.dump.download.Download.main(Download.scala)
... 6 more
The download.properties file contains the following:
-------------------------------------------------------------------------
base-url=http://dumps.wikimedia.org/
base-dir=/home/mmlab/wikipedia
download=nl:pages-articles.xml.bz2
unzip=false
retry-max=5
retry-millis=10000
---------------------------------------------------------------------------
As a workaround I downloaded unpacked the nl-pages-articles.xml file myself
and configured the extraction.properties file as follows:
----------------------------------------------------------------------------
base-dir=/home/mmlab/wikipedia
require-download-complete=false
languages=nl
extractors=DisambiguationExtractor
extractors.nl=DisambiguationExtractor
ontology=../ontology.xml
mappings=../mappings
uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
uri-policy.iri=generic:en; xml-safe-predicates:*
format.nt.gz=n-triples;uri-policy.uri
format.nq.gz=n-quads;uri-policy.uri
format.ttl.gz=turtle-triples;uri-policy.iri
format.tql.gz=turtle-quads;uri-policy.iri
----------------------------------------------------------------------------
--
However, when executing the command mvn scala:run "-Dlauncher=extraction"
"-DaddArgs extraction.properties", the following is displayed:
[INFO] Scanning for projects...
[INFO]
------------------------------------------------------------------------
[INFO] Building DBpedia Dump Extraction
[INFO] task-segment: [scala:run]
[INFO]
------------------------------------------------------------------------
[INFO] Preparing scala:run
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/main/
resources
[INFO] [scala:compile {execution: process-resources}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Nothing to compile - all classes are up to date
[INFO] [scala:compile {execution: compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/test/
resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [scala:testCompile {execution: test-compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO] [scala:run {execution: default-cli}]
[INFO] Checking for multiple versions of scala
[INFO] launcher 'extraction' selected =>
org.dbpedia.extraction.dump.extract.Extraction
Sep 19, 2012 3:38:12 PM org.dbpedia.extraction.mappings.Redirects$
loadFromCache
INFO: Loading redirects from cache file
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
Sep 19, 2012 3:38:12 PM org.dbpedia.extraction.mappings.Redirects$ load
INFO: Will extract redirects from source for nl wiki, could not load cache
file
'/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.ob
j': java.io.FileNotFoundException:
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
(No such file or directory)
Sep 19, 2012 3:38:12 PM org.dbpedia.extraction.mappings.Redirects$
loadFromSource
INFO: Loading redirects from source (nl)
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
at
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.
java:26)
Caused by: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[1,249]
Message: expected <mediawiki> with namespace
[http://www.mediawiki.org/xml/export-0.6/], found
[http://www.mediawiki.org/xml/export-0.7/]
at
org.dbpedia.util.text.xml.XMLStreamUtils.requireElement(XMLStreamUtils.java:
120)
at
org.dbpedia.util.text.xml.XMLStreamUtils.requireStartElement(XMLStreamUtils.
java:81)
at
org.dbpedia.extraction.sources.WikipediaDumpParser.requireStartElement(Wikip
ediaDumpParser.java:411)
at
org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpPar
ser.java:130)
at
org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.j
ava:114)
at
org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:64)
at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
at
org.dbpedia.extraction.sources.XMLReaderSource.flatMap(XMLSource.scala:60)
at
org.dbpedia.extraction.mappings.Redirects$.loadFromSource(Redirects.scala:16
5)
at
org.dbpedia.extraction.mappings.Redirects$.load(Redirects.scala:116)
at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anon$1.<init>(ConfigLoader
.scala:96)
at
org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump
$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:51)
at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$
1.apply(ConfigLoader.scala:36)
at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$
1.apply(ConfigLoader.scala:36)
at
scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
at
scala.collection.Iterator$class.foreach(Iterator.scala:772)
at
scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
at
scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike
.scala:41)
at
scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:80)
at
org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:29)
at
org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
... 6 more
Any help is appreciated.
[1]
http://www.mail-archive.com/[email protected]/msg0365
4.html
[2] http://dbpedia.org/documentation
[3]
http://answers.semanticweb.com/questions/18153/dbpedia-38-extraction-missing
-wikipediascsv
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion