Hi pedro,
This is an inconsistency between wikipedia and mappings wiki (the latter is
still in 0.6). To generate both at the same extraction you need to set the
namespace variable to null.
About the spotlight, I already started the generation process. If you still
want to start and still have trouble with the dataset generation I could
send you "my datasets" to work with :)
Best,
Dimitris
On Thu, Sep 20, 2012 at 6:42 PM, Pablo N. Mendes <[email protected]>wrote:
> Looks like your dump has 0.6. Can you pls delete the dbpedia jars from
> your .m2 dir, run mvn clean install and double check your wikipedia dump to
> make sure everything is clean?
> On Sep 20, 2012 6:16 PM, "Pedro Debevere" <[email protected]>
> wrote:
>
>> Hi Max,
>>
>> This was indeed the problem. Now the extraction proceeds a little further
>> (creating the nlwiki-20120824-template-redirects_old.obj file) but it then
>> displays the following:
>>
>> mvn scala:run "-Dlauncher=extraction" "-DaddArgs=extraction.properties"
>> [INFO] Scanning for projects...
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Building DBpedia Dump Extraction
>> [INFO] task-segment: [scala:run]
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Preparing scala:run
>> [INFO] [resources:resources {execution: default-resources}]
>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> [INFO] skip non existing resourceDirectory
>> /home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/main/resources
>> [INFO] [scala:compile {execution: process-resources}]
>> [INFO] Checking for multiple versions of scala
>> [INFO] includes = [**/*.scala,**/*.java,]
>> [INFO] excludes = []
>> [INFO] Nothing to compile - all classes are up to date
>> [INFO] [compiler:compile {execution: default-compile}]
>> [INFO] Nothing to compile - all classes are up to date
>> [INFO] [scala:compile {execution: compile}]
>> [INFO] Checking for multiple versions of scala
>> [INFO] includes = [**/*.scala,**/*.java,]
>> [INFO] excludes = []
>> [INFO] Nothing to compile - all classes are up to date
>> [INFO] [resources:testResources {execution: default-testResources}]
>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> [INFO] skip non existing resourceDirectory
>> /home/mmlab/DBpedia_Extraction_Framework/extraction_framework/dump/src/test/resources
>> [INFO] [compiler:testCompile {execution: default-testCompile}]
>> [INFO] No sources to compile
>> [INFO] [scala:testCompile {execution: test-compile}]
>> [INFO] Checking for multiple versions of scala
>> [INFO] includes = [**/*.scala,**/*.java,]
>> [INFO] excludes = []
>> [WARNING] No source files found.
>> [INFO] [scala:run {execution: default-cli}]
>> [INFO] Checking for multiple versions of scala
>> [INFO] launcher 'extraction' selected =>
>> org.dbpedia.extraction.dump.extract.Extraction
>> Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$
>> loadFromCache
>> INFO: Loading redirects from cache file
>> /home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
>> Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$ load
>> INFO: Will extract redirects from source for nl wiki, could not load
>> cache file
>> '/home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj':
>> java.io.FileNotFoundException:
>> /home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
>> (No such file or directory)
>> Sep 20, 2012 6:03:30 PM org.dbpedia.extraction.mappings.Redirects$
>> loadFromSource
>> INFO: Loading redirects from source (nl)
>> Sep 20, 2012 6:06:13 PM
>> org.dbpedia.extraction.mappings.Redirects$RedirectFinder apply
>> WARNING: wrong redirect. page:
>> [title=RCH-Pinguins;ns=0/Main/;language:wiki=nl,locale=nl].
>> found by dbpedia:
>> [title=MediaMonks-RCH;ns=0/Main/;language:wiki=nl,locale=nl].
>> found by wikipedia: [null]
>> Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.mappings.Redirects$
>> loadFromSource
>> INFO: Redirects loaded from source (nl)
>> Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.mappings.Redirects$ load
>> INFO: 738 redirects written to cache file
>> /home/mmlab/wikipedia/nlwiki/20120824/nlwiki-20120824-template-redirects.obj
>> Sep 20, 2012 6:06:40 PM org.dbpedia.extraction.ontology.io.OntologyReader
>> read
>> INFO: Loading ontology pages
>> java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:601)
>> at
>> org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
>> at
>> org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
>> Caused by: java.lang.reflect.InvocationTargetException
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>> at
>> org.dbpedia.extraction.mappings.CompositeExtractor$$anonfun$1.apply(CompositeExtractor.scala:25)
>> at
>> org.dbpedia.extraction.mappings.CompositeExtractor$$anonfun$1.apply(CompositeExtractor.scala:25)
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>> at
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
>> at
>> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>> at scala.collection.immutable.List.foreach(List.scala:76)
>> at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>> at scala.collection.immutable.List.map(List.scala:76)
>> at
>> org.dbpedia.extraction.mappings.CompositeExtractor$.load(CompositeExtractor.scala:25)
>> at org.dbpedia.extraction.dump.extract.ConfigLoader.org
>> $dbpedia$extraction$dump$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:103)
>> at
>> org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:36)
>> at
>> org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:36)
>> at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
>> at scala.collection.Iterator$class.foreach(Iterator.scala:772)
>> at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:399)
>> at
>> scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:41)
>> at
>> scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:80)
>> at
>> org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:29)
>> at
>> org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
>> ... 6 more
>> Caused by: javax.xml.stream.XMLStreamException: ParseError at
>> [row,col]:[1,99]
>> Message: expected <mediawiki> with namespace [
>> http://www.mediawiki.org/xml/export-0.7/], found [
>> http://www.mediawiki.org/xml/export-0.6/]
>> at
>> org.dbpedia.util.text.xml.XMLStreamUtils.requireElement(XMLStreamUtils.java:120)
>> at
>> org.dbpedia.util.text.xml.XMLStreamUtils.requireStartElement(XMLStreamUtils.java:81)
>> at
>> org.dbpedia.extraction.sources.WikipediaDumpParser.requireStartElement(WikipediaDumpParser.java:411)
>> at
>> org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:130)
>> at
>> org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:114)
>> at
>> org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:64)
>> at
>> scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
>> at
>> org.dbpedia.extraction.sources.XMLReaderSource.map(XMLSource.scala:60)
>> at
>> org.dbpedia.extraction.ontology.io.OntologyReader.read(OntologyReader.scala:22)
>> at org.dbpedia.extraction.dump.extract.ConfigLoader.org
>> $dbpedia$extraction$dump$extract$ConfigLoader$$_ontology(ConfigLoader.scala:181)
>> at
>> org.dbpedia.extraction.dump.extract.ConfigLoader$$anon$1.ontology(ConfigLoader.scala:53)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:601)
>> at
>> org.dbpedia.extraction.mappings.DisambiguationExtractor.<init>(DisambiguationExtractor.scala:24)
>> ... 29 more
>>
>> I already set _namespace = "http://www.mediawiki.org/xml/export-0.7/";
>> in the WikipediaDumpParser.java class.
>>
>> Hopefully you can help me out with this.
>>
>>
>> -----Original Message-----
>> From: Max Jakob [mailto:[email protected]]
>> Sent: Thursday, September 20, 2012 1:54 PM
>> To: Pedro Debevere
>> Cc: [email protected]
>> Subject: Re: [Dbpedia-discussion] DBpedia Extraction Framework Dutch
>> disambiguation data set
>>
>> On Thu, Sep 20, 2012 at 10:17 AM, Pedro Debevere <[email protected]>
>> wrote:
>> > Caused by: java.lang.IllegalArgumentException: Illegal pattern
>> character 'X'
>> > at java.text.SimpleDateFormat.compile(SimpleDateFormat.java:769)
>> > at
>> java.text.SimpleDateFormat.initialize(SimpleDateFormat.java:576)
>> > at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:501)
>> > at java.text.SimpleDateFormat.<init>(SimpleDateFormat.java:476)
>> > at
>> > org.dbpedia.extraction.util.StringUtils$$anon$1.initialValue(StringUti
>> > ls.scala:16)
>>
>> I think you might have to use Java 7 in order to allow 'X' in date format
>> strings. Compare:
>> http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
>> http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html
>>
>> For that, you'll need to change <java.compiler.version> in the main pom
>> to 1.7 and have Java 7 on your machine.
>>
>>
>> > Can anyone provide me some more details about the TODO mentioned in [5]?
>>
>> It seems that in the Dutch Wikipedia, the URLs of disambiguation pages
>> are suffixed with multiple different strings (and sometimes none as you
>> mentioned). All these strings have to be known for the process of
>> extracting disambiguation links, but currently the configuration only
>> allows for one. This should be extended to a set of strings.
>>
>> Cheers,
>> Max
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.455 / Virus Database: 271.1.1/5265 - Release Date: 09/18/12
>> 19:47:00
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
--
Kontokostas Dimitris
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion