Hi Max,

This was indeed the problem. Now the extraction proceeds a little further 
(creating the nlwiki-20120824-template-redirects_old.obj file) but it then 
displays the following:

mvn scala:run "-Dlauncher=extraction" "-DaddArgs=extraction.properties"
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building DBpedia Dump Extraction
[INFO]    task-segment: [scala:run]
[INFO] ------------------------------------------------------------------------
[INFO] Preparing scala:run
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/main/resources
[INFO] [scala:compile {execution: process-resources}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Nothing to compile - all classes are up to date
[INFO] [scala:compile {execution: compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/test/resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [scala:testCompile {execution: test-compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO] [scala:run {execution: default-cli}]
[INFO] Checking for multiple versions of scala
[INFO] launcher 'extraction' selected => 
org.dbpedia.extraction.dump.extract.Extraction
Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$ loadFromCache
INFO: Loading redirects from cache file 
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$ load
INFO: Will extract redirects from source for nl wiki, could not load cache file 
'/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj': 
java.io.FileNotFoundException: 
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj 
(No such file or directory)
Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$ 
loadFromSource
INFO: Loading redirects from source (nl)
Sep 20, 2012 6:06:13 PM 
org.dbpedia.extraction.mappings.Redirects$RedirectFinder apply
WARNING: wrong redirect. page: 
[title=RCH-Pinguins;ns=0/Main/;language:wiki=nl,locale=nl].
found by dbpedia:   
[title=MediaMonks-RCH;ns=0/Main/;language:wiki=nl,locale=nl].
found by wikipedia: [null]
Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.mappings.Redirects$ 
loadFromSource
INFO: Redirects loaded from source (nl)
Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.mappings.Redirects$ load
INFO: 738 redirects written to cache file 
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.ontology.io.OntologyReader read
INFO: Loading ontology pages
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
        at 
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
        at 
org.dbpedia.extraction.mappings.CompositeExtractor$$anonfun$1.apply(CompositeExtractor.scala:25)
        at 
org.dbpedia.extraction.mappings.CompositeExtractor$$anonfun$1.apply(CompositeExtractor.scala:25)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
        at 
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
        at scala.collection.immutable.List.foreach(List.scala:76)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
        at scala.collection.immutable.List.map(List.scala:76)
        at 
org.dbpedia.extraction.mappings.CompositeExtractor$.load(CompositeExtractor.scala:25)
        at 
org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:103)
        at 
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:36)
        at 
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:36)
        at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
        at scala.collection.Iterator$class.foreach(Iterator.scala:772)
        at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
        at 
scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:41)
        at 
scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:80)
        at 
org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:29)
        at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
        ... 6 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,99]
Message: expected <mediawiki> with namespace 
[http://www.mediawiki.org/xml/export-0.7/], found 
[http://www.mediawiki.org/xml/export-0.6/]
        at 
org.dbpedia.util.text.xml.XMLStreamUtils.requireElement(XMLStreamUtils.java:120)
        at 
org.dbpedia.util.text.xml.XMLStreamUtils.requireStartElement(XMLStreamUtils.java:81)
        at 
org.dbpedia.extraction.sources.WikipediaDumpParser.requireStartElement(WikipediaDumpParser.java:411)
        at 
org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:130)
        at 
org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:114)
        at 
org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:64)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
        at 
org.dbpedia.extraction.sources.XMLReaderSource.map(XMLSource.scala:60)
        at 
org.dbpedia.extraction.ontology.io.OntologyReader.read(OntologyReader.scala:22)
        at 
org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump$extract$ConfigLoader$$_ontology(ConfigLoader.scala:181)
        at 
org.dbpedia.extraction.dump.extract.ConfigLoader$$anon$1.ontology(ConfigLoader.scala:53)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at 
org.dbpedia.extraction.mappings.DisambiguationExtractor.<init>(DisambiguationExtractor.scala:24)
        ... 29 more

I already set _namespace = "http://www.mediawiki.org/xml/export-0.7/";; in the 
WikipediaDumpParser.java class. 

Hopefully you can help me out with this.


-----Original Message-----
From: Max Jakob [mailto:[email protected]] 
Sent: Thursday, September 20, 2012 1:54 PM
To: Pedro Debevere
Cc: [email protected]
Subject: Re: [Dbpedia-discussion] DBpedia Extraction Framework Dutch 
disambiguation data set

On Thu, Sep 20, 2012 at 10:17 AM, Pedro Debevere <[email protected]> 
wrote:
> Caused by: java.lang.IllegalArgumentException: Illegal pattern character 'X'
>         at java.text.SimpleDateFormat.compile(SimpleDateFormat.java:769)
>         at java.text.SimpleDateFormat.initialize(SimpleDateFormat.java:576)
>         at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:501)
>         at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:476)
>         at 
> org.dbpedia.extraction.util.StringUtils$$anon$1.initialValue(StringUti
> ls.scala:16)

I think you might have to use Java 7 in order to allow 'X' in date format 
strings. Compare:
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

For that, you'll need to change <java.compiler.version> in the main pom to 1.7 
and have Java 7 on your machine.


> Can anyone provide me some more details about the TODO mentioned in [5]?

It seems that in the Dutch Wikipedia, the URLs of disambiguation pages are 
suffixed with multiple different strings (and sometimes none as you mentioned). 
All these strings have to be known for the process of extracting disambiguation 
links, but currently the configuration only allows for one. This should be 
extended to a set of strings.

Cheers,
Max

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.455 / Virus Database: 271.1.1/5265 - Release Date: 09/18/12 
19:47:00


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to