Hi fellas,

With http://svn.apache.org/r1528907 the GSoC projects source code has been commited in a new branch that we have called "disambiguation". As you might know, this year, there were two proposals for Stanbol, both related to disambiguation engines. Dileepa Jayakody has developed an Entity Disambiguation Engine using FOAF Correlation (STANBOL-1161) and Antonio Perez a Graph-Based Freebase Disambiguation Engine (STANBOL-1156). AFAIK, the results of both projects will be published by Google next week, but according to the mentors they have successfully accomplish them. I would like to congrats both Antonio and Dileepa again for the good work. Please feel free to test both solutions. In order to do it properly, you need to go through READMEs documents because both projects use some external resources that need to be build.

Because both projects have several features in common, we have been discussing at Stanbol IRC channel about a Roadmap to refactor both projects and continue improving the disambiguation stuff in Stanbol. The summary of the proposed actions is the following:

1. Create an API that would allow to easily extract disambiguation features from the context (ContenItem + Annotations). This might include a better API to deal with Annotations and the results of previous engines.

2. Provide a framework for Session (local) disambiguation. The framework should allow to configure disambiguation features from custom sites and to plugin algorithms that use those features

3. Provide a Framework for Knowledge Based Disambiguation Algorithm. He have identified three types: Text Based (e.g. Solr MLT), Graph based and Machine Learning based. ML based are more complex to generalize, so we would discard it for now. For both text and graph based, we would need to create a framework for easing KBs storing/management. Typically, text based approaches would need to store textual contents and evidences for the entities. For example Wikilinks is a dataset of documents with mentions to Freebase entities that can be used as disambiguation evidences. Graph based approaches would need to use Graph databases in order to store the relationships between the entities and provide efficient ways to manipulate the graph and plugin graph based algorithms.

Looking Forward for your feedback.

Cheers,

Rafa Haro

--

------------------------------
This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, London W6 7AN.

Reply via email to