Hi, On Wed, Sep 19, 2012 at 3:46 PM, Pedro Debevere <[email protected]> wrote: > I’m interested in creating a Dutch port of DBpedia Spotlight. In order to do > this, I need a disambiguation data set for Dutch. This data set is currently > not available for download. However, based on some messages posted here [1], > I suspect that the latest version of the extraction framework supports this. > Is this correct?
Generally yes, if all names of disambiguation templates are specified in [4]. Please also note that there seems to be an issue with multiple names for disambiguation page titles in dutch. See the TODO in [5]. > As a workaround I downloaded unpacked the nl-pages-articles.xml file myself On your first attempt, it looks like something goes wrong during download. So downloading and unpacking yourself was a good idea. > Message: expected <mediawiki> with namespace > [http://www.mediawiki.org/xml/export-0.6/], found > [http://www.mediawiki.org/xml/export-0.7/] Wikipedia seems to have changed its export format version from 0.6 to 0.7. The DBpedia parser should still be able to parse the dump, assuming the changes mentioned in [6]. You can try to switch to the dump branch (currently the stable one) and change the line in [7] to private final String _namespace = "http://www.mediawiki.org/xml/export-0.7/"; and try again. (Call mvn clean install on the project root before). Cheers, Max [4] http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/2a322c5c6692/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Disambiguation.scala#l165 [5] http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/2a322c5c6692/core/src/main/scala/org/dbpedia/extraction/config/mappings/DisambiguationExtractorConfig.scala#l16 [6] http://www.mediawiki.org/xml/export-0.7.xsd [7] http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/2a322c5c6692/core/src/main/java/org/dbpedia/extraction/sources/WikipediaDumpParser.java#l74 ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
