Hi Lourens,

welcome to DBpedia hacking! :-)

As others already said, simple class names in extraction.properties
are prefixed by "org.dbpedia.extraction.mappings." [1] (the package
where all our extractors currently live), so
org.dbpedia.extraction.mappings.DisambiguationExtractor and
DisambiguationExtractor are equivalent.

Finding disambiguation pages is not hard: just look for certain
template invocations, for example {{Disambig}}. That's what we do. The
problem is that our list of disambiguation templates [3] is outdated.
We should get the info from pages like
http://nl.wikipedia.org/wiki/MediaWiki:Disambiguationspage. I wrote
some code that almost does that but didn't have time yet to finish the
job [4].

I just updated Disambiguation.scala. To get the
DisambiguationExtractor running you'll have to add a line to
DisambiguationExtractorConfig.scala [2] - the part of the title that
many disambig pages on nl.wp contain. Similar as in the other
languages.

If there are any other questions, let us know!

Cheers,
JC

[1] 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/a7d73b918d04/dump/src/main/scala/org/dbpedia/extraction/dump/extract/Config.scala#l83
[2] 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/a7d73b918d04/core/src/main/scala/org/dbpedia/extraction/config/mappings/DisambiguationExtractorConfig.scala
[3] 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/a7d73b918d04/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Disambiguation.scala
[4] 
http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/a7d73b918d04/core/src/main/scala/org/dbpedia/extraction/util/WikiDisambigReader.scala

On Fri, Jun 15, 2012 at 5:15 PM, Meij, L.K. van der
<[email protected]> wrote:
> Thanks for your suggestions.
>
> I do not know whether attachments are allowed, so I paste the
> extraction.properties below. (apparantly I already tried adding the full
> path:dbpedia.extraction.mappings.DisambiguationExtractor,
> but without, the same error message follows).
>
> extractors.nl=MappingExtractor,DisambiguationExtractor
> and
> extractors.nl=MappingExtractor,org.dbpedia.extraction.mappings.DisambiguationExtractor,HomepageExtractor
> give the same error.
>
> If I only have
>
> extractors.nl=MappingExtractor
>
> the extraction process takes about 30 minutes and seems to end without
> problems
> (I haven't looked at generated outputfiles yet though).
>
> (I started without commenting out the other extractors, but then despite
> having
> languages=nl, other languages were being textracted)
>
> Kind regards,
>
> Lourens
>
> DETAILS
>
> The error I get, again:
> error
> ...
> Caused by: java.util.NoSuchElementException: key not found: nl
> ..
> at
> org.dbpedia.extraction.mappings.DisambiguationExtractor.<init>(DisambiguationExtractor.scala:22)
> =================extraction.properties========
> dir=/home/lourens/spotlight/wikipedia
> source=pages-articles.xml.bz2
> require-download-complete=true
> languages=nl
> extractors=ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
> GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
> RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor
> extractors.nl=MappingExtractor,org.dbpedia.extraction.mappings.DisambiguationExtractor,HomepageExtractor,ImageExtractor,\
> InterLanguageLinksExtractor
> #extractors.nl=MappingExtractor
> ontology=../ontology.xml
> mappings=../mappings
> uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
> uri-policy.iri=generic:en; xml-safe-predicates:*
> format.nt.gz=n-triples;uri-policy.uri
> format.nq.gz=n-quads;uri-policy.uri
> format.ttl.gz=turtle-triples;uri-policy.iri
> format.tql.gz=turtle-quads;uri-policy.iri
>
> On Jun 15, 2012, at 16:41 PM, Pablo Mendes wrote:
>
> Can you show us your extraction.properties? I suspect you forgot the line
> below?
>
> languages=nl
>
> I am also not sure if you should have fully qualified
> (org.dbpedia.extraction.mappings.DisambiguationExtractor) or just the class
> name (DisambiguationExtractor).
>
> Cheers,
> Pablo
>
> On Fri, Jun 15, 2012 at 1:43 PM, Meij, L.K. van der <[email protected]>
> wrote:
>>
>>
>> I am trying to set up dbpedia-spotlight for the Dutch language. There are
>> some datasets available for Dutch (nl),
>> but I expect to at least need the dbpedia "disambiguation" dataset, which
>> is not available for download.
>>
>> After setting up extraction_framework for "nl" and doing :
>> editing extraction.properties: reoving all extractors.** except:
>>
>> extractors.nl=MappingExtractor,org.dbpedia.extraction.mappings.DisambiguationExtractor,HomepageExtractor,ImageExtractor,\
>> InterLanguageLinksExtractor)
>>
>> $ cd dump;mvn scala:run
>> I get an error message:
>> ..
>> INFO: Mappings loaded (nl)
>> java.lang.reflect.InvocationTargetException
>> ..
>> Caused by: java.util.NoSuchElementException: key not found: nl
>>         at scala.collection.MapLike$class.default(MapLike.scala:225)
>>         at scala.collection.immutable.HashMap.default(HashMap.scala:38)
>>         at scala.collection.MapLike$class.apply(MapLike.scala:135)
>>         at scala.collection.immutable.HashMap.apply(HashMap.scala:38)
>>         at
>> org.dbpedia.extraction.mappings.DisambiguationExtractor.<init>(DisambiguationExtractor.scala:22)
>>
>> I assume this means that some classes have not been implemented for "nl"?
>>
>> If so, I would like to know if such an effort is on the way or whether it
>> would be feasible for me to give it a try?
>> Is there some pointer/documentation on how to get started?
>>
>> Thanks,
>>
>> Lourens
>>
>>
>> ==================
>> DETAILS OF WHAT I DID
>>
>>
>> I managed to install the extraction_framework. The documentation seems a
>> bit out of date though so it could be I
>> did things wrong.
>> I managed to download the "nl" wikipedia input by editing dump.properties
>> and doing
>> $ cd dump; mvn scala:run -Dlauncher=download
>>
>> Extraction started when commenting out all
>> other "extraction.**=" entries in dump/extraction.properties leaving only
>>
>> extractors.nl=MappingExtractor
>>
>> and running
>>
>> $ mvn scala:run
>> The output indicates that extraction proceeds nicely.
>>
>> But I expect that in the result the "disambiguation" result will be
>> missing.
>>
>> When I replace extractors.nl (analoguous to other languages):
>>
>>
>> extractors.nl=MappingExtractor,org.dbpedia.extraction.mappings.DisambiguationExtractor,HomepageExtractor,ImageExtractor,\
>> InterLanguageLinksExtractor
>>
>> I get error messages mentioned above.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to