Re: GSoC Projects and Entity Disambiguation Roadmap

Dileepa Jayakody Thu, 03 Oct 2013 10:18:08 -0700

On Thu, Oct 3, 2013 at 10:21 PM, Rafa Haro <[email protected]> wrote:

> Hi fellas,
>
> With http://svn.apache.org/r1528907 the GSoC projects source code has
> been commited in a new branch that we have called "disambiguation". As you
> might know, this year, there were two proposals for Stanbol, both related
> to disambiguation engines. Dileepa Jayakody has developed an Entity
> Disambiguation Engine using FOAF Correlation (STANBOL-1161) and Antonio
> Perez a Graph-Based Freebase Disambiguation Engine (STANBOL-1156). AFAIK,
> the results of both projects will be published by Google next week, but
> according to the mentors they have successfully accomplish them. I would
> like to congrats both Antonio and Dileepa again for the good work.



Thanks all for the support and guidance given throughout the project, it
was a great experience working with Stanbol community.


> Please feel free to test both solutions. In order to do it properly, you
> need to go through READMEs documents because both projects use some
> external resources that need to be build.
>
> Because both projects have several features in common, we have been
> discussing at Stanbol IRC channel about a Roadmap to refactor both projects
> and continue improving the disambiguation stuff in Stanbol. The summary of
> the proposed actions is the following:
>
> 1. Create an API that would allow to easily extract disambiguation
> features from the context (ContenItem + Annotations). This might include a
> better API to deal with Annotations and the results of previous engines.
>

+1, EntityAnnotation, TextAnnotation like abstractions are used for various
purposes in disambiguation. Therefore creating Java classes and a API will
be extremely useful.

>
> 2. Provide a framework for Session (local) disambiguation. The framework
> should allow to configure disambiguation features from custom sites and to
> plugin algorithms that use those features
>
> Can you please give some more details on this point?
I guess it is a framework to plugin custom vocabularies and configure
disambiguation from those vocabularies? Please correct me if I have got the
idea wrong.


> 3. Provide a Framework for Knowledge Based Disambiguation Algorithm. He
> have identified three types: Text Based (e.g. Solr MLT), Graph based and
> Machine Learning based. ML based are more complex to generalize, so we
> would discard it for now. For both text and graph based, we would need to
> create a framework for easing KBs storing/management. Typically, text based
> approaches would need to store textual contents and evidences for the
> entities. For example Wikilinks is a dataset of documents with mentions to
> Freebase entities that can be used as disambiguation evidences. Graph based
> approaches would need to use Graph databases in order to store the
> relationships between the entities and provide efficient ways to manipulate
> the graph and plugin graph based algorithms.
>
> +1.

> Looking Forward for your feedback.
>
> Cheers,
>
> Rafa Haro
>
> Thanks,
Dileepa

> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.

Re: GSoC Projects and Entity Disambiguation Roadmap

Reply via email to