[jira] [Updated] (STANBOL-1037) Entity Disambiguation for Stanbol

Rafa Haro (JIRA) Mon, 15 Apr 2013 08:58:17 -0700

     [ 
https://issues.apache.org/jira/browse/STANBOL-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rafa Haro updated STANBOL-1037:
-------------------------------

    Description: 
Entity Disambiguation in Stanbol would mainly refers to the process of 
modifying the fise:confidence values of EntityAnnotations obtained as a result 
of any Linking Engine within Stanbol (EntityLinkingEngine or 
NamedEntityLinking). Such modifications to confidence values should be done in 
order to obtain a ranking of possible candidates (entities) to link with for 
each EntityAnnotation after a disambiguation process. Each candidate would be 
an Entity within EntityHub or any other Knowledge Base configured in Stanbol.

Disambiguation
============

Entity Linking is not a trivial task due to the name ambiguity problem, i.e., 
the same name may refer to different entities in different contexts and also 
the same entity usually can be mentioned using a set of different names. For 
instance, the name Michael Jordan can refer to more than 20 entities in 
Wikipedia, some of them are
shown below:

    - Michael Jordan(NBA Player)
    - Michael I. Jordan(Berkeley Professor)
    - Michael B. Jordan(American Actor)

This situation happens not only with these well known semantic knowledge bases 
like DBpedia or Freebase, but are also important for any enterprise semantic 
dataset or custom vocabularies. An instant example is to resolve the ambiguity 
within a database of employees.  

Formally, Entity Disambiguation for Stanbol should work as follows: after an 
enhancement process of a ContentItem using an enhancement chain that includes a 
Linking Engine, we would get a set of TextAnnotations TA = {T1, T2,......Tn}. 
Each TextAnnotation in TA should contain a name mention which is characterized 
by its name, its local surrounding context (fise:selection-context) and the 
ContentItem containing it. For each TextAnnotation in TA and as a result of the 
Linking Engine, we would get a set of EntityAnnotations EAi = {E1i, E2i,....., 
ENi} where i corresponds to TextAnnotation i in TA. We should rely on the 
linking engines to provide all possible entity annotations (candidates within 
all sites in the EntityHub) for each TextAnnotation. Each EntityAnnotation is 
characterized by its Knowledge Base (entityhub:site) and its entry in that 
knowledge base (fise:entity-reference). The objetive of the disambiguation 
process is to rank each EntityAnnotation set EAi through the modification of 
its EntityAnnotations' confidence values so that the entity with the higher 
confidence value were the referent entity for the TextAnnotation associated to 
EAi.


 
    
> Entity Disambiguation for Stanbol
> ---------------------------------
>
>                 Key: STANBOL-1037
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1037
>             Project: Stanbol
>          Issue Type: Story
>          Components: Enhancer, Entityhub
>            Reporter: Rafa Haro
>              Labels: gsoc2013, mentoring
>
> Entity Disambiguation in Stanbol would mainly refers to the process of 
> modifying the fise:confidence values of EntityAnnotations obtained as a 
> result of any Linking Engine within Stanbol (EntityLinkingEngine or 
> NamedEntityLinking). Such modifications to confidence values should be done 
> in order to obtain a ranking of possible candidates (entities) to link with 
> for each EntityAnnotation after a disambiguation process. Each candidate 
> would be an Entity within EntityHub or any other Knowledge Base configured in 
> Stanbol.
> Disambiguation
> ============
> Entity Linking is not a trivial task due to the name ambiguity problem, i.e., 
> the same name may refer to different entities in different contexts and also 
> the same entity usually can be mentioned using a set of different names. For 
> instance, the name Michael Jordan can refer to more than 20 entities in 
> Wikipedia, some of them are
> shown below:
>     - Michael Jordan(NBA Player)
>     - Michael I. Jordan(Berkeley Professor)
>     - Michael B. Jordan(American Actor)
> This situation happens not only with these well known semantic knowledge 
> bases like DBpedia or Freebase, but are also important for any enterprise 
> semantic dataset or custom vocabularies. An instant example is to resolve the 
> ambiguity within a database of employees.  
> Formally, Entity Disambiguation for Stanbol should work as follows: after an 
> enhancement process of a ContentItem using an enhancement chain that includes 
> a Linking Engine, we would get a set of TextAnnotations TA = {T1, 
> T2,......Tn}. Each TextAnnotation in TA should contain a name mention which 
> is characterized by its name, its local surrounding context 
> (fise:selection-context) and the ContentItem containing it. For each 
> TextAnnotation in TA and as a result of the Linking Engine, we would get a 
> set of EntityAnnotations EAi = {E1i, E2i,....., ENi} where i corresponds to 
> TextAnnotation i in TA. We should rely on the linking engines to provide all 
> possible entity annotations (candidates within all sites in the EntityHub) 
> for each TextAnnotation. Each EntityAnnotation is characterized by its 
> Knowledge Base (entityhub:site) and its entry in that knowledge base 
> (fise:entity-reference). The objetive of the disambiguation process is to 
> rank each EntityAnnotation set EAi through the modification of its 
> EntityAnnotations' confidence values so that the entity with the higher 
> confidence value were the referent entity for the TextAnnotation associated 
> to EAi.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (STANBOL-1037) Entity Disambiguation for Stanbol

Reply via email to