On Sat, Mar 9, 2013 at 6:32 PM, Peter Kaminski <kamin...@istori.com> wrote:
> Here's big data dataset from Google Research and UMass IESL, 40 million
> "links to Wikipedia pages where the anchor text of the link closely matches
> the title of the target Wikipedia page," from 10 million web pages, for the
> purposes of contextualized disambiguation:

I wonder how many disambiguation links to Wikipedia fail to
disambiguate. People assume Wikipedia has a link and it never lets you
down, but it's often the wrong thing. E.g. "_John McLaughlin_ formed
Mahavishnu Orchestra " (links to a disambiguation page) or "gerrit is
written in _Java_" (links to the island, not the language).  _John
Howard_ will no longer link to the Australian politician if someone
more famous comes along.

"how to find out if different web pages are talking about the same
person or other entity"
Wikidata removes all doubt, http://www.wikidata.org/wiki/Q164757 ! I
assume that other knowledge projects have noticed these entities, and
that Q numbers are becoming a lingua franca.  I'm reserving Q42666789
for my talented sure-to-be famous offspring. :-)

Google clearly enjoys the fruits of Wikipedians' hard work.

--
=S Page  software engineer on Editor Engagement Experiments

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to