Hi all I have a possibly naive question but I am not able to find the answer elsewhere.
My task is to extract candidate concepts/entities for an ambiguous term from dbpedia, e.g., "cat (disambiguation)". To do so I am looking at the "dbpedia-owl:wikPageDisambiguate" field for the dbpedia page: http://dbpedia.org/page/Cat_%28disambiguation%29, and comparing it against "en.wikipedia.org/Cat_(disambiguation)". I would expect to see more or less all candidates listed on the Wikipedia Disambiguation page to be covered by the dbpedia field "dbpedia-owl:wikiPageDisambiguate", however there is quite large discrepancy - out of which the most odd one is taht the candidates on the dbpedia page do not even include the animal sense of "cat", and in fact it is included in "wikiPageWikiLink". I wonder how exactly does dbpedia extract candidates from wikipedia "disambiguation" pages? It is clear to me that some filtering has been done but it is not clear what it is. According to the dbpedia source code documentation in "extraction_framework/core/src/main/scala/org/dbpedia/extraction/mappings/DisambiguationExtractor.scala" which says "Extract only links that contain the page title or that spell out the acronym page title", it should selects many candidates that are currently missing in the "wikiPageDisambiguate" filed, but now in the "wikiPageWikiLink" field. Can any one shed some light on this please? Thanks! -- Ziqi Zhang ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
