Il 11/05/2010 2.59, Tim Finin ha scritto:
> We estimate the PMI for two Wikipedia articles based just on their
> mentions in other Wikipedia articles, So, given a pair of Wikipedia
> articles X and Y we compute log(p(X&Y)/p(X)*p(Y)) where p(X&Y) is the
> probability that a Wikipedia page will link to both X and Y and p(X)
> is the probability that a page will link to X.  If X itself links to
> Y, we count that as contributing to p(X&Y).
>
> We found ~40M pairs with a non-zero value.  We are using this in work
> on extracting linked data from tables [2].
>
> [1] http://en.wikipedia.org/wiki/Pointwise_mutual_information
> [2] http://ebiquity.umbc.edu/paper/html/id/474/

It sounds really interesting. I am reading your work soon.
Anyway, I've just two questions: how do you calculate the "probability", 
I mean, starting from a number of incoming links?
The second question is: how did you store the 40M pairs?


------------------------------------------------------------------------------

_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to