[
https://issues.apache.org/jira/browse/STANBOL-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler updated STANBOL-1102:
-----------------------------------------
Issue Type: Sub-task (was: Bug)
Parent: STANBOL-1114
> EntityLinking MUST only accept single token matches for the currently active
> Token
> ----------------------------------------------------------------------------------
>
> Key: STANBOL-1102
> URL: https://issues.apache.org/jira/browse/STANBOL-1102
> Project: Stanbol
> Issue Type: Sub-task
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> With the "Max Search Tokens (enhancer.engines.linking.maxSearchTokens)"
> configuration the EntityLinking Engine does support OR queries for multiple
> linkable/matchable tokens to the controlled vocabulary (default=2).
> This feature ensures that Entities that do match longer section in the text
> are higher ranked. This is especially important for bigger vocabularies
> and/or common tokens within the vocabulary as the EntityLinking only
> considers the top 10 (or 3 * max suggestions) query results.
> However in cases where no Entities do match several tokens of the search this
> feature currently causes unwanted side effects that is may match single
> tokens that are not the currently active one.
> E.g. the text section "Bei einer gmeinsamen Pressekonferenz mit
> FPÖ-Bundesparteivorsitzenden Heinz-Christian Strache in Langenlois" generates
> the following queries
> (1) process Token 5: FPÖ
> >> searchStrings [FPÖ, Bundesparteivorsitzenden]
> << 0: FPÖ[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for
> http://rdf.freebase.com/ns/m.013vy8
> (2) process Token 5: Bundesparteivorsitzenden
> >> searchStrings [Bundesparteivorsitzenden, Heinz]
> << 0: Heinz[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for
> http://rdf.freebase.com/ns/m.0c5y96
> (3) process Token 7: Christian
> >> searchStrings [Christian, Strache]
> << 0: Heinz-Christian Strache[m=FULL,s=2,c=2(1.0)/3]
> score=0.6666666666666666[l=0.6666666666666666,t=1.0] for
> http://rdf.freebase.com/ns/m.08lfdk
> resulting in a situation where Heinz is linked to an other Entity while
> Heinz-Christian Strache - while completely matching the text - is only linked
> with "Christian Strache" AND a lower confidence!
> The issue is that search (2) issued for the Token "Bundesparteivorsitzenden"
> MUST NOT suggest an Entity that does not match the currently active Token.
> Because this is the case in the given Example "Heinz" is already consumed and
> can not be linked with the expected Entity mention "Heinz-Christian Strache"
> This issue will add a rule to EntityLinking that the currently active Token
> need to be included in suggestions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira