[ 
https://issues.apache.org/jira/browse/STANBOL-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler updated STANBOL-1037:
-----------------------------------------

    Attachment: stanbol-enhancement-workflow.001.png
    
> Entity Disambiguation for Stanbol
> ---------------------------------
>
>                 Key: STANBOL-1037
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1037
>             Project: Stanbol
>          Issue Type: Story
>          Components: Enhancer, Entityhub
>            Reporter: Rafa Haro
>              Labels: gsoc2013, mentoring
>         Attachments: stanbol-enhancement-workflow.001.png
>
>
> Entity Disambiguation in Stanbol would mainly refers to the process of 
> modifying the fise:confidence values of EntityAnnotations obtained as a 
> result of any Linking Engine within Stanbol (EntityLinkingEngine or 
> NamedEntityLinking). Such modifications to confidence values should be done 
> in order to obtain a ranking of possible candidates (entities) to link with 
> for each EntityAnnotation after a disambiguation process. Each candidate 
> would be an Entity within EntityHub or any other Knowledge Base configured in 
> Stanbol.
> Disambiguation
> ============
> Entity Linking is not a trivial task due to the name ambiguity problem, i.e., 
> the same name may refer to different entities in different contexts and also 
> the same entity usually can be mentioned using a set of different names. For 
> instance, the name Michael Jordan can refer to more than 20 entities in 
> Wikipedia, some of them are
> shown below:
>     - Michael Jordan(NBA Player)
>     - Michael I. Jordan(Berkeley Professor)
>     - Michael B. Jordan(American Actor)
> This situation happens not only with these well known semantic knowledge 
> bases like DBpedia or Freebase, but are also important for any enterprise 
> semantic dataset or custom vocabularies. An instant example is to resolve the 
> ambiguity within a database of employees.  
> Formally, Entity Disambiguation for Stanbol should work as follows: after an 
> enhancement process of a ContentItem using an enhancement chain that includes 
> a Linking Engine, we would get a set of TextAnnotations TA = {T1, 
> T2,......Tn}. Each TextAnnotation in TA should contain a name mention which 
> is characterized by its name, its local surrounding context 
> (fise:selection-context) and the ContentItem containing it. For each 
> TextAnnotation in TA and as a result of the Linking Engine, we would get a 
> set of EntityAnnotations EAi = {E1i, E2i,....., ENi} where i corresponds to 
> TextAnnotation i in TA. We should rely on the linking engines to provide all 
> possible entity annotations (candidates within all sites in the EntityHub) 
> for each TextAnnotation. Each EntityAnnotation is characterized by its 
> Knowledge Base (entityhub:site) and its entry in that knowledge base 
> (fise:entity-reference). The objetive of the disambiguation process is to 
> rank each EntityAnnotation set EAi through the modification of its 
> EntityAnnotations' confidence values so that the entity with the higher 
> confidence value were the referent entity for the TextAnnotation associated 
> to EAi.
> Algorithms
> ========
>  ** Local Approaches
> (From [1]) Conventional entity linking approaches have focused on making 
> independent Entity Linking decisions using the local mention-to-entity 
> compatibility for each isolated mention. The essential idea was to extract 
> the discriminative features from the description of a specific entity and 
> then link each name mention in a document by comparing the contextual 
> similarity with each of its candidate referent entities. Such approach is 
> followed by Disambiguation-MLT engine in STANBOL-723.
> ** Global Approaches (Collective Entity Linking)
> The main drawback of the local-based approaches stems from the fact that they 
> do not take into consideration the interdependence between different Entity 
> Linking decisions. Specifically, the entities in a topical coherent document 
> usually are semantically related to each other. In such cases, figuring out 
> the referent entity of one name mention may in turn give us useful 
> information to link the other name mentions in the same document. That 
> suggests that disambiguation performance could be improved by resolving all 
> mentions at the same time.
> This approach only makes sense in an scenario with highly connected knowledge 
> bases where the entities are semantically related in some way.
> ** Graph Based Approaches
> In these approaches, both Knowledge Base and interdependence between possible 
> Entity Linking decisions are modeled as graphs and inference algorithms are 
> used to resolve all the mentions within a document.
> Knowledge Bases
> ==============
> As described in STANBOL-223, for Disambiguation, it is necessary to use some 
> data as disambiguation features

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to