On 5/10/10 7:29 PM, Roberto Mirizzi wrote: >> We've found it useful to use the page links to compute a PMI >> (pointwise mutual information) metric for pairs of pages. We >> used this to help map entity mentions in text to Wikipedia >> entities, resolving possible ambiguities (e.g., among the >> seven George Bushes in Wikipedia. > > How did you calculate PMI? Do you have some reference to show?
We assume the standard definition of PMI, e.g., [1]. We were interested in estimating the PMI for pairs of concepts (entities, events, ...) that Wikipedia articles denote. The Wikipedia convention is that if you mention a Wikipedia object, you link the first mention to the appropriate Wikipedia article. This has advantages over using, for example, a search engine to find the PMI of two strings that are entity mentions (e.g., "George Bush", "Mr. Quale") since both are ambiguous. We estimate the PMI for two Wikipedia articles based just on their mentions in other Wikipedia articles, So, given a pair of Wikipedia articles X and Y we compute log(p(X&Y)/p(X)*p(Y)) where p(X&Y) is the probability that a Wikipedia page will link to both X and Y and p(X) is the probability that a page will link to X. If X itself links to Y, we count that as contributing to p(X&Y). We found ~40M pairs with a non-zero value. We are using this in work on extracting linked data from tables [2]. [1] http://en.wikipedia.org/wiki/Pointwise_mutual_information [2] http://ebiquity.umbc.edu/paper/html/id/474/ ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
