Hi everybody

I'm Antonio David Pérez, a new Zaizi team member and a student for a MSc at
the University of Seville. Lastly, I've been involved in the development of
a semantic CMS solution in a Spanish Company called Ximdex working with
several technologies like Apache Nutch, Apache Solr and also Apache Stanbol.

Currently, I've been assigned to a project that involves different
technologies like Apache Stanbol and Apache ManifoldCF. So, related to
Stanbol, I'm interested in the disambiguation problem, so I would like to
prepare a proposal for GSoC about this topic.

I have been following last mails about disambiguation and WebID protocol. I
would be more interesting in develop disambiguation systems within Stanbol
using the major semantic knowledge bases. Actually, my initial idea is to
use Freebase with the aim to make it extensible to any other database like
Wikipedia and DBpedia. Following STANBOL-1037 [1], the main goal is to
implement a couple of global-approach disambiguation algorithms to be used
in Stanbol.

For this, I would like to discuss some topics about the proposal:

- Knowledge Base: I have decided to stick first to Freebase, because it has
a REST API allowing 100k calls per day for read and 10k for write. Besides
the REST API, an alternative could be to integrate the whole freebase graph
in Stanbol and use their Java API to manage it. Ideally, the management
framework should be valid for others knowledge bases as Wikipedia or
DBpedia.

- Resources: As have been pointed before in the mailing lists, google has
released a couple of resources to be used in disambiguation applications.
One if a dictionary of concepts from Wikipedia, using anchor text labels in
Wikipedia internal links to create an index of entities possible names [2].
The second one is a dataset of texts that links to concepts in the
Wikipedia [3] that can be used as disambiguation contexts according to
STANBOL-1037. I need to research if similar information can be retrieved
directly from freebase or , in other words, to check if this information is
already incorporated in Freebase.

Moreover, the proposal design will try to be as generic as possible in
order to be adaptable to any other Knowledge Base.

Waiting for your comments and valuable suggestions.

Thanks

Regards


[1] https://issues.apache.org/jira/browse/STANBOL-1037
[2]
http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-back.html
[3] https://code.google.com/p/wiki-links/

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.

Reply via email to