Il 11/05/2010 2.59, Tim Finin ha scritto: > We estimate the PMI for two Wikipedia articles based just on their > mentions in other Wikipedia articles, So, given a pair of Wikipedia > articles X and Y we compute log(p(X&Y)/p(X)*p(Y)) where p(X&Y) is the > probability that a Wikipedia page will link to both X and Y and p(X) > is the probability that a page will link to X. If X itself links to > Y, we count that as contributing to p(X&Y). > > We found ~40M pairs with a non-zero value. We are using this in work > on extracting linked data from tables [2]. > > [1] http://en.wikipedia.org/wiki/Pointwise_mutual_information > [2] http://ebiquity.umbc.edu/paper/html/id/474/
It sounds really interesting. I am reading your work soon. Anyway, I've just two questions: how do you calculate the "probability", I mean, starting from a number of incoming links? The second question is: how did you store the 40M pairs? ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
