Hi,

On Wed, Sep 19, 2012 at 3:46 PM, Pedro Debevere <[email protected]> wrote:
> I’m interested in creating a Dutch port of DBpedia Spotlight. In order to do
> this, I need a disambiguation data set for Dutch. This data set is currently
> not available for download. However, based on some messages posted here [1],
> I suspect that the latest version of the extraction framework supports this.
> Is this correct?

Generally yes, if all names of disambiguation templates are specified
in [4]. Please also note that there seems to be an issue with multiple
names for disambiguation page titles in dutch. See the TODO in [5].


> As a workaround I downloaded unpacked the nl-pages-articles.xml file myself

On your first attempt, it looks like something goes wrong during
download. So downloading and unpacking yourself was a good idea.


> Message: expected <mediawiki> with namespace
> [http://www.mediawiki.org/xml/export-0.6/], found
> [http://www.mediawiki.org/xml/export-0.7/]

Wikipedia seems to have changed its export format version from 0.6 to
0.7. The DBpedia parser should still be able to parse the dump,
assuming the changes mentioned in [6]. You can try to switch to the
dump branch (currently the stable one) and change the line in [7] to

  private final String _namespace = "http://www.mediawiki.org/xml/export-0.7/";;

and try again. (Call  mvn clean install  on the project root before).


Cheers,
Max

[4] 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/2a322c5c6692/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Disambiguation.scala#l165
[5] 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/2a322c5c6692/core/src/main/scala/org/dbpedia/extraction/config/mappings/DisambiguationExtractorConfig.scala#l16
[6] http://www.mediawiki.org/xml/export-0.7.xsd
[7] 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/2a322c5c6692/core/src/main/java/org/dbpedia/extraction/sources/WikipediaDumpParser.java#l74

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to