[
https://issues.apache.org/jira/browse/STANBOL-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rafa Haro updated STANBOL-1037:
-------------------------------
Description:
Entity Disambiguation in Stanbol would mainly refers to the process of
modifying the fise:confidence values of EntityAnnotations obtained as a result
of any Linking Engine within Stanbol (EntityLinkingEngine or
NamedEntityLinking). Such modifications to confidence values should be done in
order to obtain a ranking of possible candidates (entities) to link with for
each EntityAnnotation after a disambiguation process. Each candidate would be
an Entity within EntityHub or any other Knowledge Base configured in Stanbol.
Disambiguation
============
Entity Linking is not a trivial task due to the name ambiguity problem, i.e.,
the same name may refer to different entities in different contexts and also
the same entity usually can be mentioned using a set of different names. For
instance, the name Michael Jordan can refer to more than 20 entities in
Wikipedia, some of them are
shown below:
- Michael Jordan(NBA Player)
- Michael I. Jordan(Berkeley Professor)
- Michael B. Jordan(American Actor)
This situation happens not only with these well known semantic knowledge bases
like DBpedia or Freebase, but are also important for any enterprise semantic
dataset or custom vocabularies. An instant example is to resolve the ambiguity
within a database of employees.
Formally, Entity Disambiguation for Stanbol should work as follows: after an
enhancement process of a ContentItem using an enhancement chain that includes a
Linking Engine, we would get a set of TextAnnotations TA = {T1, T2,......Tn}.
Each TextAnnotation in TA should contain a name mention which is characterized
by its name, its local surrounding context (fise:selection-context) and the
ContentItem containing it. For each TextAnnotation in TA and as a result of the
Linking Engine, we would get a set of EntityAnnotations EAi = {E1i, E2i,.....,
ENi} where i corresponds to TextAnnotation i in TA. We should rely on the
linking engines to provide all possible entity annotations (candidates within
all sites in the EntityHub) for each TextAnnotation. Each EntityAnnotation is
characterized by its Knowledge Base (entityhub:site) and its entry in that
knowledge base (fise:entity-reference). The objetive of the disambiguation
process is to rank each EntityAnnotation set EAi through the modification of
its EntityAnnotations' confidence values so that the entity with the higher
confidence value were the referent entity for the TextAnnotation associated to
EAi.
> Entity Disambiguation for Stanbol
> ---------------------------------
>
> Key: STANBOL-1037
> URL: https://issues.apache.org/jira/browse/STANBOL-1037
> Project: Stanbol
> Issue Type: Story
> Components: Enhancer, Entityhub
> Reporter: Rafa Haro
> Labels: gsoc2013, mentoring
>
> Entity Disambiguation in Stanbol would mainly refers to the process of
> modifying the fise:confidence values of EntityAnnotations obtained as a
> result of any Linking Engine within Stanbol (EntityLinkingEngine or
> NamedEntityLinking). Such modifications to confidence values should be done
> in order to obtain a ranking of possible candidates (entities) to link with
> for each EntityAnnotation after a disambiguation process. Each candidate
> would be an Entity within EntityHub or any other Knowledge Base configured in
> Stanbol.
> Disambiguation
> ============
> Entity Linking is not a trivial task due to the name ambiguity problem, i.e.,
> the same name may refer to different entities in different contexts and also
> the same entity usually can be mentioned using a set of different names. For
> instance, the name Michael Jordan can refer to more than 20 entities in
> Wikipedia, some of them are
> shown below:
> - Michael Jordan(NBA Player)
> - Michael I. Jordan(Berkeley Professor)
> - Michael B. Jordan(American Actor)
> This situation happens not only with these well known semantic knowledge
> bases like DBpedia or Freebase, but are also important for any enterprise
> semantic dataset or custom vocabularies. An instant example is to resolve the
> ambiguity within a database of employees.
> Formally, Entity Disambiguation for Stanbol should work as follows: after an
> enhancement process of a ContentItem using an enhancement chain that includes
> a Linking Engine, we would get a set of TextAnnotations TA = {T1,
> T2,......Tn}. Each TextAnnotation in TA should contain a name mention which
> is characterized by its name, its local surrounding context
> (fise:selection-context) and the ContentItem containing it. For each
> TextAnnotation in TA and as a result of the Linking Engine, we would get a
> set of EntityAnnotations EAi = {E1i, E2i,....., ENi} where i corresponds to
> TextAnnotation i in TA. We should rely on the linking engines to provide all
> possible entity annotations (candidates within all sites in the EntityHub)
> for each TextAnnotation. Each EntityAnnotation is characterized by its
> Knowledge Base (entityhub:site) and its entry in that knowledge base
> (fise:entity-reference). The objetive of the disambiguation process is to
> rank each EntityAnnotation set EAi through the modification of its
> EntityAnnotations' confidence values so that the entity with the higher
> confidence value were the referent entity for the TextAnnotation associated
> to EAi.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira