Hello,

 

I'm interested in creating a Dutch port of DBpedia Spotlight. In order to do
this, I need a disambiguation data set for Dutch. This data set is currently
not available for download. However, based on some messages posted here [1],
I suspect that the latest version of the extraction framework supports this.
Is this correct?

 

I already imported the extraction framework and it builds successfully (I
only included the core, dump and scripts modules as [2] states that the
other modules are not necessary for running the extraction.). The messages
posted at [3] indicate that it is only necessary to run download and
extract. However, when executing download (using the command ../run download
config=download.properties), the following message is displayed:

 

[INFO] Scanning for projects...

[INFO]
------------------------------------------------------------------------

[INFO] Building DBpedia Dump Extraction

[INFO]    task-segment: [scala:run]

[INFO]
------------------------------------------------------------------------

[INFO] Preparing scala:run

[INFO] [resources:resources {execution: default-resources}]

[INFO] Using 'UTF-8' encoding to copy filtered resources.

[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/main/
resources

[INFO] [scala:compile {execution: process-resources}]

[INFO] Checking for multiple versions of scala

[INFO] includes = [**/*.scala,**/*.java,]

[INFO] excludes = []

[INFO] Nothing to compile - all classes are up to date

[INFO] [compiler:compile {execution: default-compile}]

[INFO] Nothing to compile - all classes are up to date

[INFO] [scala:compile {execution: compile}]

[INFO] Checking for multiple versions of scala

[INFO] includes = [**/*.scala,**/*.java,]

[INFO] excludes = []

[INFO] Nothing to compile - all classes are up to date

[INFO] [resources:testResources {execution: default-testResources}]

[INFO] Using 'UTF-8' encoding to copy filtered resources.

[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/test/
resources

[INFO] [compiler:testCompile {execution: default-testCompile}]

[INFO] No sources to compile

[INFO] [scala:testCompile {execution: test-compile}]

[INFO] Checking for multiple versions of scala

[INFO] includes = [**/*.scala,**/*.java,]

[INFO] excludes = []

[WARNING] No source files found.

[INFO] [scala:run {execution: default-cli}]

[INFO] Checking for multiple versions of scala

[INFO] launcher 'download' selected =>
org.dbpedia.extraction.dump.download.Download

done: 0 - 

todo: 1 - wiki=nl,locale=nl

downloading 'http://dumps.wikimedia.org/nlwiki/' to
'/home/mmlab/wikipedia/nlwiki/index.html'

java.lang.reflect.InvocationTargetException

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)

                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)

                at java.lang.reflect.Method.invoke(Method.java:616)

                at
org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)

                at
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.
java:26)

Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long

                at scala.runtime.BoxesRunTime.unboxToLong(Unknown Source)

                at
org.dbpedia.extraction.dump.download.Counter$.getContentLength(Counter.scala
:38)

                at
org.dbpedia.extraction.dump.download.Counter$class.inputStream(Counter.scala
:22)

                at
org.dbpedia.extraction.dump.download.Download$Downloader$1.inputStream(Downl
oad.scala:29)

                at
org.dbpedia.extraction.dump.download.FileDownloader$class.downloadFile(FileD
ownloader.scala:49)

                at
org.dbpedia.extraction.dump.download.Download$Downloader$1.org$dbpedia$extra
ction$dump$download$LastModified$$super$downloadFile(Download.scala:29)

                at
org.dbpedia.extraction.dump.download.LastModified$class.downloadFile(LastMod
ified.scala:21)

                at
org.dbpedia.extraction.dump.download.Download$Downloader$1.downloadFile(Down
load.scala:29)

                at
org.dbpedia.extraction.dump.download.FileDownloader$class.downloadFile(FileD
ownloader.scala:36)

                at
org.dbpedia.extraction.dump.download.Download$Downloader$1.org$dbpedia$extra
ction$dump$download$Retry$$super$downloadFile(Download.scala:29)

                at
org.dbpedia.extraction.dump.download.Retry$class.downloadFile(Retry.scala:28
)

                at
org.dbpedia.extraction.dump.download.Download$Downloader$1.downloadFile(Down
load.scala:29)

                at
org.dbpedia.extraction.dump.download.FileDownloader$class.downloadTo(FileDow
nloader.scala:26)

                at
org.dbpedia.extraction.dump.download.Download$Downloader$1.downloadTo(Downlo
ad.scala:29)

                at
org.dbpedia.extraction.dump.download.LanguageDownloader.downloadDates(Langua
geDownloader.scala:35)

                at
org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download
.scala:67)

                at
org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download
.scala:62)

                at
scala.collection.immutable.TreeSet$$anonfun$foreach$1.apply(TreeSet.scala:11
4)

                at
scala.collection.immutable.TreeSet$$anonfun$foreach$1.apply(TreeSet.scala:11
4)

                at
scala.collection.immutable.RedBlack$NonEmpty.foreach(RedBlack.scala:164)

                at
scala.collection.immutable.TreeSet.foreach(TreeSet.scala:114)

                at
org.dbpedia.extraction.dump.download.Download$.main(Download.scala:62)

                at
org.dbpedia.extraction.dump.download.Download.main(Download.scala)

                ... 6 more

 

The download.properties file contains the following:

-------------------------------------------------------------------------

base-url=http://dumps.wikimedia.org/

base-dir=/home/mmlab/wikipedia

download=nl:pages-articles.xml.bz2

unzip=false

retry-max=5

retry-millis=10000

---------------------------------------------------------------------------

 

As a workaround I downloaded unpacked the nl-pages-articles.xml file myself
and configured the extraction.properties file as follows:

----------------------------------------------------------------------------

base-dir=/home/mmlab/wikipedia

require-download-complete=false

languages=nl

 

extractors=DisambiguationExtractor

extractors.nl=DisambiguationExtractor

 

ontology=../ontology.xml

mappings=../mappings

 

uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*

uri-policy.iri=generic:en; xml-safe-predicates:*

 

format.nt.gz=n-triples;uri-policy.uri

format.nq.gz=n-quads;uri-policy.uri

 

format.ttl.gz=turtle-triples;uri-policy.iri

format.tql.gz=turtle-quads;uri-policy.iri

----------------------------------------------------------------------------
--

 

However, when executing the command mvn scala:run "-Dlauncher=extraction"
"-DaddArgs extraction.properties", the following is displayed:

 

[INFO] Scanning for projects...

[INFO]
------------------------------------------------------------------------

[INFO] Building DBpedia Dump Extraction

[INFO]    task-segment: [scala:run]

[INFO]
------------------------------------------------------------------------

[INFO] Preparing scala:run

[INFO] [resources:resources {execution: default-resources}]

[INFO] Using 'UTF-8' encoding to copy filtered resources.

[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/main/
resources

[INFO] [scala:compile {execution: process-resources}]

[INFO] Checking for multiple versions of scala

[INFO] includes = [**/*.scala,**/*.java,]

[INFO] excludes = []

[INFO] Nothing to compile - all classes are up to date

[INFO] [compiler:compile {execution: default-compile}]

[INFO] Nothing to compile - all classes are up to date

[INFO] [scala:compile {execution: compile}]

[INFO] Checking for multiple versions of scala

[INFO] includes = [**/*.scala,**/*.java,]

[INFO] excludes = []

[INFO] Nothing to compile - all classes are up to date

[INFO] [resources:testResources {execution: default-testResources}]

[INFO] Using 'UTF-8' encoding to copy filtered resources.

[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/test/
resources

[INFO] [compiler:testCompile {execution: default-testCompile}]

[INFO] No sources to compile

[INFO] [scala:testCompile {execution: test-compile}]

[INFO] Checking for multiple versions of scala

[INFO] includes = [**/*.scala,**/*.java,]

[INFO] excludes = []

[WARNING] No source files found.

[INFO] [scala:run {execution: default-cli}]

[INFO] Checking for multiple versions of scala

[INFO] launcher 'extraction' selected =>
org.dbpedia.extraction.dump.extract.Extraction

Sep 19, 2012 3:38:12 PM org.dbpedia.extraction.mappings.Redirects$
loadFromCache

INFO: Loading redirects from cache file
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj

Sep 19, 2012 3:38:12 PM org.dbpedia.extraction.mappings.Redirects$ load

INFO: Will extract redirects from source for nl wiki, could not load cache
file
'/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.ob
j': java.io.FileNotFoundException:
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
(No such file or directory)

Sep 19, 2012 3:38:12 PM org.dbpedia.extraction.mappings.Redirects$
loadFromSource

INFO: Loading redirects from source (nl)

java.lang.reflect.InvocationTargetException

                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

                at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)

                at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)

                at java.lang.reflect.Method.invoke(Method.java:616)

                at
org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)

                at
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.
java:26)

Caused by: javax.xml.stream.XMLStreamException: ParseError at
[row,col]:[1,249]

Message: expected <mediawiki> with namespace
[http://www.mediawiki.org/xml/export-0.6/], found
[http://www.mediawiki.org/xml/export-0.7/]

                at
org.dbpedia.util.text.xml.XMLStreamUtils.requireElement(XMLStreamUtils.java:
120)

                at
org.dbpedia.util.text.xml.XMLStreamUtils.requireStartElement(XMLStreamUtils.
java:81)

                at
org.dbpedia.extraction.sources.WikipediaDumpParser.requireStartElement(Wikip
ediaDumpParser.java:411)

                at
org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpPar
ser.java:130)

                at
org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.j
ava:114)

                at
org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:64)

                at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)

                at
org.dbpedia.extraction.sources.XMLReaderSource.flatMap(XMLSource.scala:60)

                at
org.dbpedia.extraction.mappings.Redirects$.loadFromSource(Redirects.scala:16
5)

                at
org.dbpedia.extraction.mappings.Redirects$.load(Redirects.scala:116)

                at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anon$1.<init>(ConfigLoader
.scala:96)

                at
org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump
$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:51)

                at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$
1.apply(ConfigLoader.scala:36)

                at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$
1.apply(ConfigLoader.scala:36)

                at
scala.collection.Iterator$$anon$19.next(Iterator.scala:401)

                at
scala.collection.Iterator$class.foreach(Iterator.scala:772)

                at
scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)

                at
scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike
.scala:41)

                at
scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:80)

                at
org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:29)

                at
org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)

                ... 6 more

 

Any help is appreciated.

 

[1]
http://www.mail-archive.com/[email protected]/msg0365
4.html

[2] http://dbpedia.org/documentation

[3]
http://answers.semanticweb.com/questions/18153/dbpedia-38-extraction-missing
-wikipediascsv

 

 

 

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to