Setting _namespace = nul does the trick.
Thank you all for helping me out.
Hi pedro,
This is an inconsistency between wikipedia and mappings wiki (the latter is
still in 0.6). To generate both at the same extraction you need to set the
namespace variable to null.
About the spotlight, I already started the generation process. If you still
want to start and still have trouble with the dataset generation I could send
you "my datasets" to work with :)
Best,
Dimitris
On Thu, Sep 20, 2012 at 6:42 PM, Pablo N. Mendes <[email protected]> wrote:
Looks like your dump has 0.6. Can you pls delete the dbpedia jars from your .m2
dir, run mvn clean install and double check your wikipedia dump to make sure
everything is clean?
On Sep 20, 2012 6:16 PM, "Pedro Debevere" <[email protected]> wrote:
Hi Max,
This was indeed the problem. Now the extraction proceeds a little further
(creating the nlwiki-20120824-template-redirects_old.obj file) but it then
displays the following:
mvn scala:run "-Dlauncher=extraction" "-DaddArgs=extraction.properties"
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building DBpedia Dump Extraction
[INFO] task-segment: [scala:run]
[INFO] ------------------------------------------------------------------------
[INFO] Preparing scala:run
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/main/resources
[INFO] [scala:compile {execution: process-resources}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Nothing to compile - all classes are up to date
[INFO] [scala:compile {execution: compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
/home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/test/resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [scala:testCompile {execution: test-compile}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO] [scala:run {execution: default-cli}]
[INFO] Checking for multiple versions of scala
[INFO] launcher 'extraction' selected =>
org.dbpedia.extraction.dump.extract.Extraction
Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$ loadFromCache
INFO: Loading redirects from cache file
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$ load
INFO: Will extract redirects from source for nl wiki, could not load cache file
'/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj':
java.io.FileNotFoundException:
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
(No such file or directory)
Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$
loadFromSource
INFO: Loading redirects from source (nl)
Sep 20, 2012 6:06:13 PM
org.dbpedia.extraction.mappings.Redirects$RedirectFinder apply
WARNING: wrong redirect. page:
[title=RCH-Pinguins;ns=0/Main/;language:wiki=nl,locale=nl].
found by dbpedia:
[title=MediaMonks-RCH;ns=0/Main/;language:wiki=nl,locale=nl].
found by wikipedia: [null]
Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.mappings.Redirects$
loadFromSource
INFO: Redirects loaded from source (nl)
Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.mappings.Redirects$ load
INFO: 738 redirects written to cache file
/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.ontology.io.OntologyReader read
INFO: Loading ontology pages
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
at
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at
org.dbpedia.extraction.mappings.CompositeExtractor$$anonfun$1.apply(CompositeExtractor.scala:25)
at
org.dbpedia.extraction.mappings.CompositeExtractor$$anonfun$1.apply(CompositeExtractor.scala:25)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:76)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at scala.collection.immutable.List.map(List.scala:76)
at
org.dbpedia.extraction.mappings.CompositeExtractor$.load(CompositeExtractor.scala:25)
at
org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:103)
at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:36)
at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:36)
at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
at
scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:41)
at
scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:80)
at
org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:29)
at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
... 6 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,99]
Message: expected <mediawiki> with namespace
[http://www.mediawiki.org/xml/export-0.7/], found
[http://www.mediawiki.org/xml/export-0.6/]
at
org.dbpedia.util.text.xml.XMLStreamUtils.requireElement(XMLStreamUtils.java:120)
at
org.dbpedia.util.text.xml.XMLStreamUtils.requireStartElement(XMLStreamUtils.java:81)
at
org.dbpedia.extraction.sources.WikipediaDumpParser.requireStartElement(WikipediaDumpParser.java:411)
at
org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:130)
at
org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:114)
at
org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:64)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
at
org.dbpedia.extraction.sources.XMLReaderSource.map(XMLSource.scala:60)
at
org.dbpedia.extraction.ontology.io.OntologyReader.read(OntologyReader.scala:22)
at
org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump$extract$ConfigLoader$$_ontology(ConfigLoader.scala:181)
at
org.dbpedia.extraction.dump.extract.ConfigLoader$$anon$1.ontology(ConfigLoader.scala:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.dbpedia.extraction.mappings.DisambiguationExtractor.<init>(DisambiguationExtractor.scala:24)
... 29 more
I already set _namespace = "http://www.mediawiki.org/xml/export-0.7/"; in the
WikipediaDumpParser.java class.
Hopefully you can help me out with this.
-----Original Message-----
From: Max Jakob [mailto:[email protected]]
Sent: Thursday, September 20, 2012 1:54 PM
To: Pedro Debevere
Cc: [email protected]
Subject: Re: [Dbpedia-discussion] DBpedia Extraction Framework Dutch
disambiguation data set
On Thu, Sep 20, 2012 at 10:17 AM, Pedro Debevere <[email protected]>
wrote:
> Caused by: java.lang.IllegalArgumentException: Illegal pattern character 'X'
> at java.text.SimpleDateFormat.compile(SimpleDateFormat.java:769)
> at java.text.SimpleDateFormat.initialize(SimpleDateFormat.java:576)
> at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:501)
> at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:476)
> at
> org.dbpedia.extraction.util.StringUtils$$anon$1.initialValue(StringUti
> ls.scala:16)
I think you might have to use Java 7 in order to allow 'X' in date format
strings. Compare:
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
For that, you'll need to change <java.compiler.version> in the main pom to 1.7
and have Java 7 on your machine.
> Can anyone provide me some more details about the TODO mentioned in [5]?
It seems that in the Dutch Wikipedia, the URLs of disambiguation pages are
suffixed with multiple different strings (and sometimes none as you mentioned).
All these strings have to be known for the process of extracting disambiguation
links, but currently the configuration only allows for one. This should be
extended to a set of strings.
Cheers,
Max
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.455 / Virus Database: 271.1.1/5265 - Release Date: 09/18/12
19:47:00
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Kontokostas Dimitris
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.455 / Virus Database: 271.1.1/5265 - Release Date: 09/18/12
19:47:00
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion