Hi fellas,
With http://svn.apache.org/r1528907 the GSoC projects source code has
been commited in a new branch that we have called "disambiguation". As
you might know, this year, there were two proposals for Stanbol, both
related to disambiguation engines. Dileepa Jayakody has developed an
Entity Disambiguation Engine using FOAF Correlation (STANBOL-1161) and
Antonio Perez a Graph-Based Freebase Disambiguation Engine
(STANBOL-1156). AFAIK, the results of both projects will be published by
Google next week, but according to the mentors they have successfully
accomplish them. I would like to congrats both Antonio and Dileepa again
for the good work. Please feel free to test both solutions. In order to
do it properly, you need to go through READMEs documents because both
projects use some external resources that need to be build.
Because both projects have several features in common, we have been
discussing at Stanbol IRC channel about a Roadmap to refactor both
projects and continue improving the disambiguation stuff in Stanbol. The
summary of the proposed actions is the following:
1. Create an API that would allow to easily extract disambiguation
features from the context (ContenItem + Annotations). This might include
a better API to deal with Annotations and the results of previous engines.
2. Provide a framework for Session (local) disambiguation. The framework
should allow to configure disambiguation features from custom sites and
to plugin algorithms that use those features
3. Provide a Framework for Knowledge Based Disambiguation Algorithm. He
have identified three types: Text Based (e.g. Solr MLT), Graph based and
Machine Learning based. ML based are more complex to generalize, so we
would discard it for now. For both text and graph based, we would need
to create a framework for easing KBs storing/management. Typically, text
based approaches would need to store textual contents and evidences for
the entities. For example Wikilinks is a dataset of documents with
mentions to Freebase entities that can be used as disambiguation
evidences. Graph based approaches would need to use Graph databases in
order to store the relationships between the entities and provide
efficient ways to manipulate the graph and plugin graph based algorithms.
Looking Forward for your feedback.
Cheers,
Rafa Haro
--
------------------------------
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
London W6 7AN.