Hi,

For my master thesis, I implemented a tweet annotator to classify tweets which 
has somewhat the same purpose as DBpedia Spotlight, except I think it works 
better on tweets and other extremely short/sparse texts. The disambiguation 
approach I use is inspired on a cheap relatedness measure based on inlinks of 
candidate entities (Milne & Witten, Learning to Link with Wikipedia, Low-cost 
semantic relatedness measure papers). This would be an idea of my own for 
DBpedia Spotlight, a new, link-based, cheap disambiguator. However, I don't 
know if you want this because the Referent Graph approach seems to be generally 
better. However, the disambiguation precision of Referent Graph is only 
slightly better than Milne and Witten's approach, as is mentioned in the 
Referent Graph paper. But the relatedness measure of Milne and Witten might be 
much cheaper than doing all the Referent Graph computations. Not sure about the 
disambiguation speed though. In theory, an M&W disambiguator would not require 
any context analysis, just inlinks.
Would you like a link-based disambiguator option in DBpedia Spotlight?

I've also got a question for the first existing Spotlight idea (Google mention 
corpus): Is this just like rebuilding the Lexicalization dataset using Google 
data with a few MapReduce jobs? Or is there more to it?
As for the worry for the overlap between Google mention corpus and the current 
data, doesn't the Google corpus also contain all mentions on Wikipedia as well? 
A completely Google-based index would just contain whatever the current ground 
data are? Also, I don't get why we need to crawl pages (from a discussion with 
Cai Zhiwei) to generate a new Lexicalization dataset. Or do we need to crawl 
them to get the context of the mention?

Kind regards,

Denis
                                          
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to