Hi,

the DBpedia data [1] was extracted from an old version [2] of the
Wikipedia page. That's probably the main reason for the discrepancy
with the current Wikipedia page [3] you observed. For example, that
version contained a link to  [[domestic cat]]. DBpedia only extracts
disambiguation links that contain the disambiguated word, and the case
must also match. In this case, the disambiguated word is 'Cat', but
the link contained 'cat', so it was not extracted.

I just changed the DisambiguationExtractor to use case-insensitive
matching. That should let us extract a few more correct disambiguation
targets in the next release without adding too many wrong ones.

JC

[1] http://dbpedia.org/page/Cat_%28disambiguation%29
[2] http://en.wikipedia.org/wiki/Cat_(disambiguation)?oldid=437952435
(or a version close to it)
[3] http://en.wikipedia.org/wiki/Cat_(disambiguation)?oldid=490978301

On Wed, May 23, 2012 at 11:44 AM, Ziqi Zhang
<[email protected]> wrote:
> Hi all
>
> I have a possibly naive question but I am not able to find the answer
> elsewhere.
>
> My task is to extract candidate concepts/entities for an ambiguous term
> from dbpedia, e.g., "cat (disambiguation)". To do so I am looking at the
> "dbpedia-owl:wikPageDisambiguate" field for the dbpedia page:
> http://dbpedia.org/page/Cat_%28disambiguation%29, and comparing it
> against "en.wikipedia.org/Cat_(disambiguation)". I would expect to see
> more or less all candidates listed on the Wikipedia Disambiguation page
> to be covered by the dbpedia field "dbpedia-owl:wikiPageDisambiguate",
> however there is quite large discrepancy - out of which the most odd one
> is taht the candidates on the dbpedia page do not even include the
> animal sense of "cat", and in fact it is included in "wikiPageWikiLink".
>
> I wonder how exactly does dbpedia extract candidates from wikipedia
> "disambiguation" pages? It is clear to me that some filtering has been
> done but it is not clear what it is. According to the dbpedia source
> code documentation in
> "extraction_framework/core/src/main/scala/org/dbpedia/extraction/mappings/DisambiguationExtractor.scala"
> which says "Extract only links that contain the page title or that spell
> out the acronym page title", it should selects many candidates that are
> currently missing in the "wikiPageDisambiguate" filed, but now in the
> "wikiPageWikiLink" field.
>
> Can any one shed some light on this please?
>
> Thanks!
>
> --
> Ziqi Zhang
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to