GSoC Projects and Entity Disambiguation Roadmap

Rafa Haro Thu, 03 Oct 2013 09:52:45 -0700

Hi fellas,

With http://svn.apache.org/r1528907 the GSoC projects source code hasbeen commited in a new branch that we have called "disambiguation". Asyou might know, this year, there were two proposals for Stanbol, bothrelated to disambiguation engines. Dileepa Jayakody has developed anEntity Disambiguation Engine using FOAF Correlation (STANBOL-1161) andAntonio Perez a Graph-Based Freebase Disambiguation Engine(STANBOL-1156). AFAIK, the results of both projects will be published byGoogle next week, but according to the mentors they have successfullyaccomplish them. I would like to congrats both Antonio and Dileepa againfor the good work. Please feel free to test both solutions. In order todo it properly, you need to go through READMEs documents because bothprojects use some external resources that need to be build.

Because both projects have several features in common, we have beendiscussing at Stanbol IRC channel about a Roadmap to refactor bothprojects and continue improving the disambiguation stuff in Stanbol. Thesummary of the proposed actions is the following:

1. Create an API that would allow to easily extract disambiguationfeatures from the context (ContenItem + Annotations). This might includea better API to deal with Annotations and the results of previous engines.

2. Provide a framework for Session (local) disambiguation. The frameworkshould allow to configure disambiguation features from custom sites andto plugin algorithms that use those features

3. Provide a Framework for Knowledge Based Disambiguation Algorithm. Hehave identified three types: Text Based (e.g. Solr MLT), Graph based andMachine Learning based. ML based are more complex to generalize, so wewould discard it for now. For both text and graph based, we would needto create a framework for easing KBs storing/management. Typically, textbased approaches would need to store textual contents and evidences forthe entities. For example Wikilinks is a dataset of documents withmentions to Freebase entities that can be used as disambiguationevidences. Graph based approaches would need to use Graph databases inorder to store the relationships between the entities and provideefficient ways to manipulate the graph and plugin graph based algorithms.


Looking Forward for your feedback.

Cheers,

Rafa Haro

--

------------------------------

This message should be regarded as confidential. If you have received thisemail in error please notify the sender and destroy it immediately.Statements of intent shall only become binding when confirmed in hard copyby an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,London W6 7AN.

GSoC Projects and Entity Disambiguation Roadmap

Reply via email to