[
https://issues.apache.org/jira/browse/STANBOL-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler updated STANBOL-1037:
-----------------------------------------
Attachment: stanbol-enhancement-workflow.001.png
> Entity Disambiguation for Stanbol
> ---------------------------------
>
> Key: STANBOL-1037
> URL: https://issues.apache.org/jira/browse/STANBOL-1037
> Project: Stanbol
> Issue Type: Story
> Components: Enhancer, Entityhub
> Reporter: Rafa Haro
> Labels: gsoc2013, mentoring
> Attachments: stanbol-enhancement-workflow.001.png
>
>
> Entity Disambiguation in Stanbol would mainly refers to the process of
> modifying the fise:confidence values of EntityAnnotations obtained as a
> result of any Linking Engine within Stanbol (EntityLinkingEngine or
> NamedEntityLinking). Such modifications to confidence values should be done
> in order to obtain a ranking of possible candidates (entities) to link with
> for each EntityAnnotation after a disambiguation process. Each candidate
> would be an Entity within EntityHub or any other Knowledge Base configured in
> Stanbol.
> Disambiguation
> ============
> Entity Linking is not a trivial task due to the name ambiguity problem, i.e.,
> the same name may refer to different entities in different contexts and also
> the same entity usually can be mentioned using a set of different names. For
> instance, the name Michael Jordan can refer to more than 20 entities in
> Wikipedia, some of them are
> shown below:
> - Michael Jordan(NBA Player)
> - Michael I. Jordan(Berkeley Professor)
> - Michael B. Jordan(American Actor)
> This situation happens not only with these well known semantic knowledge
> bases like DBpedia or Freebase, but are also important for any enterprise
> semantic dataset or custom vocabularies. An instant example is to resolve the
> ambiguity within a database of employees.
> Formally, Entity Disambiguation for Stanbol should work as follows: after an
> enhancement process of a ContentItem using an enhancement chain that includes
> a Linking Engine, we would get a set of TextAnnotations TA = {T1,
> T2,......Tn}. Each TextAnnotation in TA should contain a name mention which
> is characterized by its name, its local surrounding context
> (fise:selection-context) and the ContentItem containing it. For each
> TextAnnotation in TA and as a result of the Linking Engine, we would get a
> set of EntityAnnotations EAi = {E1i, E2i,....., ENi} where i corresponds to
> TextAnnotation i in TA. We should rely on the linking engines to provide all
> possible entity annotations (candidates within all sites in the EntityHub)
> for each TextAnnotation. Each EntityAnnotation is characterized by its
> Knowledge Base (entityhub:site) and its entry in that knowledge base
> (fise:entity-reference). The objetive of the disambiguation process is to
> rank each EntityAnnotation set EAi through the modification of its
> EntityAnnotations' confidence values so that the entity with the higher
> confidence value were the referent entity for the TextAnnotation associated
> to EAi.
> Algorithms
> ========
> ** Local Approaches
> (From [1]) Conventional entity linking approaches have focused on making
> independent Entity Linking decisions using the local mention-to-entity
> compatibility for each isolated mention. The essential idea was to extract
> the discriminative features from the description of a specific entity and
> then link each name mention in a document by comparing the contextual
> similarity with each of its candidate referent entities. Such approach is
> followed by Disambiguation-MLT engine in STANBOL-723.
> ** Global Approaches (Collective Entity Linking)
> The main drawback of the local-based approaches stems from the fact that they
> do not take into consideration the interdependence between different Entity
> Linking decisions. Specifically, the entities in a topical coherent document
> usually are semantically related to each other. In such cases, figuring out
> the referent entity of one name mention may in turn give us useful
> information to link the other name mentions in the same document. That
> suggests that disambiguation performance could be improved by resolving all
> mentions at the same time.
> This approach only makes sense in an scenario with highly connected knowledge
> bases where the entities are semantically related in some way.
> ** Graph Based Approaches
> In these approaches, both Knowledge Base and interdependence between possible
> Entity Linking decisions are modeled as graphs and inference algorithms are
> used to resolve all the mentions within a document.
> Knowledge Bases
> ==============
> As described in STANBOL-223, for Disambiguation, it is necessary to use some
> data as disambiguation features
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira