On 5/10/10 7:29 PM, Roberto Mirizzi wrote:
>> We've found it useful to use the page links to compute a PMI
>> (pointwise mutual information) metric for pairs of pages.  We
>> used this to help map entity mentions in text to Wikipedia
>> entities, resolving possible ambiguities (e.g., among the
>> seven George Bushes in Wikipedia.
 >
> How did you calculate PMI? Do you have some reference to show?

We assume the standard definition of PMI, e.g., [1].  We were
interested in estimating the PMI for pairs of concepts (entities,
events, ...) that Wikipedia articles denote.  The Wikipedia convention
is that if you mention a Wikipedia object, you link the first mention
to the appropriate Wikipedia article.

This has advantages over using, for example, a search engine to find
the PMI of two strings that are entity mentions (e.g., "George Bush",
"Mr. Quale") since both are ambiguous.

We estimate the PMI for two Wikipedia articles based just on their
mentions in other Wikipedia articles, So, given a pair of Wikipedia
articles X and Y we compute log(p(X&Y)/p(X)*p(Y)) where p(X&Y) is the
probability that a Wikipedia page will link to both X and Y and p(X)
is the probability that a page will link to X.  If X itself links to
Y, we count that as contributing to p(X&Y).

We found ~40M pairs with a non-zero value.  We are using this in work
on extracting linked data from tables [2].

[1] http://en.wikipedia.org/wiki/Pointwise_mutual_information
[2] http://ebiquity.umbc.edu/paper/html/id/474/

------------------------------------------------------------------------------

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to