On Thu, Oct 3, 2013 at 10:21 PM, Rafa Haro <rh...@zaizi.com> wrote: > Hi fellas, > > With http://svn.apache.org/r1528907 the GSoC projects source code has > been commited in a new branch that we have called "disambiguation". As you > might know, this year, there were two proposals for Stanbol, both related > to disambiguation engines. Dileepa Jayakody has developed an Entity > Disambiguation Engine using FOAF Correlation (STANBOL-1161) and Antonio > Perez a Graph-Based Freebase Disambiguation Engine (STANBOL-1156). AFAIK, > the results of both projects will be published by Google next week, but > according to the mentors they have successfully accomplish them. I would > like to congrats both Antonio and Dileepa again for the good work.
Thanks all for the support and guidance given throughout the project, it was a great experience working with Stanbol community. > Please feel free to test both solutions. In order to do it properly, you > need to go through READMEs documents because both projects use some > external resources that need to be build. > > Because both projects have several features in common, we have been > discussing at Stanbol IRC channel about a Roadmap to refactor both projects > and continue improving the disambiguation stuff in Stanbol. The summary of > the proposed actions is the following: > > 1. Create an API that would allow to easily extract disambiguation > features from the context (ContenItem + Annotations). This might include a > better API to deal with Annotations and the results of previous engines. > +1, EntityAnnotation, TextAnnotation like abstractions are used for various purposes in disambiguation. Therefore creating Java classes and a API will be extremely useful. > > 2. Provide a framework for Session (local) disambiguation. The framework > should allow to configure disambiguation features from custom sites and to > plugin algorithms that use those features > > Can you please give some more details on this point? I guess it is a framework to plugin custom vocabularies and configure disambiguation from those vocabularies? Please correct me if I have got the idea wrong. > 3. Provide a Framework for Knowledge Based Disambiguation Algorithm. He > have identified three types: Text Based (e.g. Solr MLT), Graph based and > Machine Learning based. ML based are more complex to generalize, so we > would discard it for now. For both text and graph based, we would need to > create a framework for easing KBs storing/management. Typically, text based > approaches would need to store textual contents and evidences for the > entities. For example Wikilinks is a dataset of documents with mentions to > Freebase entities that can be used as disambiguation evidences. Graph based > approaches would need to use Graph databases in order to store the > relationships between the entities and provide efficient ways to manipulate > the graph and plugin graph based algorithms. > > +1. > Looking Forward for your feedback. > > Cheers, > > Rafa Haro > > Thanks, Dileepa > -- > > ------------------------------ > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy > by an authorised signatory. > > Zaizi Ltd is registered in England and Wales with the registration number > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > London W6 7AN.