[ 
https://issues.apache.org/jira/browse/STANBOL-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632680#comment-13632680
 ] 

Rupert Westenthaler commented on STANBOL-1037:
----------------------------------------------

Integration into the Stanbol Enhancement Workflow
=======================================

Within the Stanbol Enhancer Workfow Disambiguation is considered to be done in 
the post processing phase. See the attached "stanbol-enhancement-workflow.png.

This means that at the stage where disambiguation engines are called the 
following information will be available in the ContentItem (and can be used for 
disambiguation)

* 'fise:TextAnnotation'[1]: Those specify occurrences of Entities within the 
text that where recognized by Language Processing engines

* 'fise:EntityAnnotation'[2]: Those represent Entities - formally described by 
some vocabulary - that are suggested for an occurrence in the processed Text. 
'fise:EntityAnnotations' are 'dc:related' to 'fise:TextAnnotations'

* AnalyzedText ContentPart [3]: The AnalyzedText is contains all the results of 
the NLP Analyses. It can be used to obtain Tokens, Chunks and Sentences as well 
as Part of Speech (POS), Named Entity annotations (NER), Phrase annotations and 
other morphological features (e.g the lemma). 



[1] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
[2] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fiseentityannotation
[3] 
http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/#nlp-processing-api
                
> Entity Disambiguation for Stanbol
> ---------------------------------
>
>                 Key: STANBOL-1037
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1037
>             Project: Stanbol
>          Issue Type: Story
>          Components: Enhancer, Entityhub
>            Reporter: Rafa Haro
>              Labels: gsoc2013, mentoring
>         Attachments: stanbol-enhancement-workflow.001.png
>
>
> Entity Disambiguation in Stanbol would mainly refers to the process of 
> modifying the fise:confidence values of EntityAnnotations obtained as a 
> result of any Linking Engine within Stanbol (EntityLinkingEngine or 
> NamedEntityLinking). Such modifications to confidence values should be done 
> in order to obtain a ranking of possible candidates (entities) to link with 
> for each EntityAnnotation after a disambiguation process. Each candidate 
> would be an Entity within EntityHub or any other Knowledge Base configured in 
> Stanbol.
> Disambiguation
> ============
> Entity Linking is not a trivial task due to the name ambiguity problem, i.e., 
> the same name may refer to different entities in different contexts and also 
> the same entity usually can be mentioned using a set of different names. For 
> instance, the name Michael Jordan can refer to more than 20 entities in 
> Wikipedia, some of them are
> shown below:
>     - Michael Jordan(NBA Player)
>     - Michael I. Jordan(Berkeley Professor)
>     - Michael B. Jordan(American Actor)
> This situation happens not only with these well known semantic knowledge 
> bases like DBpedia or Freebase, but are also important for any enterprise 
> semantic dataset or custom vocabularies. An instant example is to resolve the 
> ambiguity within a database of employees.  
> Formally, Entity Disambiguation for Stanbol should work as follows: after an 
> enhancement process of a ContentItem using an enhancement chain that includes 
> a Linking Engine, we would get a set of TextAnnotations TA = {T1, 
> T2,......Tn}. Each TextAnnotation in TA should contain a name mention which 
> is characterized by its name, its local surrounding context 
> (fise:selection-context) and the ContentItem containing it. For each 
> TextAnnotation in TA and as a result of the Linking Engine, we would get a 
> set of EntityAnnotations EAi = {E1i, E2i,....., ENi} where i corresponds to 
> TextAnnotation i in TA. We should rely on the linking engines to provide all 
> possible entity annotations (candidates within all sites in the EntityHub) 
> for each TextAnnotation. Each EntityAnnotation is characterized by its 
> Knowledge Base (entityhub:site) and its entry in that knowledge base 
> (fise:entity-reference). The objetive of the disambiguation process is to 
> rank each EntityAnnotation set EAi through the modification of its 
> EntityAnnotations' confidence values so that the entity with the higher 
> confidence value were the referent entity for the TextAnnotation associated 
> to EAi.
> Algorithms
> ========
>  ** Local Approaches
> (From [1]) Conventional entity linking approaches have focused on making 
> independent Entity Linking decisions using the local mention-to-entity 
> compatibility for each isolated mention. The essential idea was to extract 
> the discriminative features from the description of a specific entity and 
> then link each name mention in a document by comparing the contextual 
> similarity with each of its candidate referent entities. Such approach is 
> followed by Disambiguation-MLT engine in STANBOL-723.
> ** Global Approaches (Collective Entity Linking)
> The main drawback of the local-based approaches stems from the fact that they 
> do not take into consideration the interdependence between different Entity 
> Linking decisions. Specifically, the entities in a topical coherent document 
> usually are semantically related to each other. In such cases, figuring out 
> the referent entity of one name mention may in turn give us useful 
> information to link the other name mentions in the same document. That 
> suggests that disambiguation performance could be improved by resolving all 
> mentions at the same time.
> This approach only makes sense in an scenario with highly connected knowledge 
> bases where the entities are semantically related in some way.
> ** Graph Based Approaches
> In these approaches, both Knowledge Base and interdependence between possible 
> Entity Linking decisions are modeled as graphs and inference algorithms are 
> used to resolve all the mentions within a document.
> Knowledge Bases
> ==============
> As described in STANBOL-223, for Disambiguation, it is necessary to use some 
> data as disambiguation features

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to