Hi, the DBpedia data [1] was extracted from an old version [2] of the Wikipedia page. That's probably the main reason for the discrepancy with the current Wikipedia page [3] you observed. For example, that version contained a link to [[domestic cat]]. DBpedia only extracts disambiguation links that contain the disambiguated word, and the case must also match. In this case, the disambiguated word is 'Cat', but the link contained 'cat', so it was not extracted.
I just changed the DisambiguationExtractor to use case-insensitive matching. That should let us extract a few more correct disambiguation targets in the next release without adding too many wrong ones. JC [1] http://dbpedia.org/page/Cat_%28disambiguation%29 [2] http://en.wikipedia.org/wiki/Cat_(disambiguation)?oldid=437952435 (or a version close to it) [3] http://en.wikipedia.org/wiki/Cat_(disambiguation)?oldid=490978301 On Wed, May 23, 2012 at 11:44 AM, Ziqi Zhang <[email protected]> wrote: > Hi all > > I have a possibly naive question but I am not able to find the answer > elsewhere. > > My task is to extract candidate concepts/entities for an ambiguous term > from dbpedia, e.g., "cat (disambiguation)". To do so I am looking at the > "dbpedia-owl:wikPageDisambiguate" field for the dbpedia page: > http://dbpedia.org/page/Cat_%28disambiguation%29, and comparing it > against "en.wikipedia.org/Cat_(disambiguation)". I would expect to see > more or less all candidates listed on the Wikipedia Disambiguation page > to be covered by the dbpedia field "dbpedia-owl:wikiPageDisambiguate", > however there is quite large discrepancy - out of which the most odd one > is taht the candidates on the dbpedia page do not even include the > animal sense of "cat", and in fact it is included in "wikiPageWikiLink". > > I wonder how exactly does dbpedia extract candidates from wikipedia > "disambiguation" pages? It is clear to me that some filtering has been > done but it is not clear what it is. According to the dbpedia source > code documentation in > "extraction_framework/core/src/main/scala/org/dbpedia/extraction/mappings/DisambiguationExtractor.scala" > which says "Extract only links that contain the page title or that spell > out the acronym page title", it should selects many candidates that are > currently missing in the "wikiPageDisambiguate" filed, but now in the > "wikiPageWikiLink" field. > > Can any one shed some light on this please? > > Thanks! > > -- > Ziqi Zhang > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
