Hi,
For my master thesis, I implemented a tweet annotator to classify tweets which
has somewhat the same purpose as DBpedia Spotlight, except I think it works
better on tweets and other extremely short/sparse texts. The disambiguation
approach I use is inspired on a cheap relatedness measure based on inlinks of
candidate entities (Milne & Witten, Learning to Link with Wikipedia, Low-cost
semantic relatedness measure papers). This would be an idea of my own for
DBpedia Spotlight, a new, link-based, cheap disambiguator. However, I don't
know if you want this because the Referent Graph approach seems to be generally
better. However, the disambiguation precision of Referent Graph is only
slightly better than Milne and Witten's approach, as is mentioned in the
Referent Graph paper. But the relatedness measure of Milne and Witten might be
much cheaper than doing all the Referent Graph computations. Not sure about the
disambiguation speed though. In theory, an M&W disambiguator would not require
any context analysis, just inlinks.
Would you like a link-based disambiguator option in DBpedia Spotlight?
I've also got a question for the first existing Spotlight idea (Google mention
corpus): Is this just like rebuilding the Lexicalization dataset using Google
data with a few MapReduce jobs? Or is there more to it?
As for the worry for the overlap between Google mention corpus and the current
data, doesn't the Google corpus also contain all mentions on Wikipedia as well?
A completely Google-based index would just contain whatever the current ground
data are? Also, I don't get why we need to crawl pages (from a discussion with
Cai Zhiwei) to generate a new Lexicalization dataset. Or do we need to crawl
them to get the context of the mention?
Kind regards,
Denis
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc