[ 
https://issues.apache.org/jira/browse/STANBOL-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler updated STANBOL-1102:
-----------------------------------------

    Summary: EntityLinking MUST only accept Suggestions for the current active 
Token  (was: EntityLinking MUST only accept single token matches for the 
currently active Token)
    
> EntityLinking MUST only accept Suggestions for the current active Token
> -----------------------------------------------------------------------
>
>                 Key: STANBOL-1102
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1102
>             Project: Stanbol
>          Issue Type: Sub-task
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> With the "Max Search Tokens (enhancer.engines.linking.maxSearchTokens)" 
> configuration the EntityLinking Engine does support OR queries for multiple 
> linkable/matchable tokens to the controlled vocabulary (default=2). 
> This feature ensures that Entities that do match longer section in the text 
> are higher ranked. This is especially important for bigger vocabularies 
> and/or common tokens within the vocabulary as the EntityLinking only 
> considers the top 10 (or 3 * max suggestions) query results. 
> However in cases where no Entities do match several tokens of the search this 
> feature currently causes unwanted side effects that is may match single 
> tokens that are not the currently active one. 
> E.g. the text section "Bei einer gmeinsamen Pressekonferenz mit 
> FPÖ-Bundesparteivorsitzenden Heinz-Christian Strache in Langenlois" generates 
> the following queries
> (1) process Token 5: FPÖ
>   >> searchStrings [FPÖ, Bundesparteivorsitzenden]
>   << 0: FPÖ[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for 
> http://rdf.freebase.com/ns/m.013vy8
> (2) process Token 5: Bundesparteivorsitzenden
>   >> searchStrings [Bundesparteivorsitzenden, Heinz]
>  << 0: Heinz[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for 
> http://rdf.freebase.com/ns/m.0c5y96
> (3) process Token 7: Christian
>   >> searchStrings [Christian, Strache]
>  << 0: Heinz-Christian Strache[m=FULL,s=2,c=2(1.0)/3] 
> score=0.6666666666666666[l=0.6666666666666666,t=1.0] for 
> http://rdf.freebase.com/ns/m.08lfdk
> resulting in a situation where Heinz is linked to an other Entity while 
> Heinz-Christian Strache - while completely matching the text - is only linked 
> with "Christian Strache" AND a lower confidence!
> The issue is that search (2) issued for the Token "Bundesparteivorsitzenden" 
> MUST NOT suggest an Entity that does not match the currently active Token. 
> Because this is the case in the given Example "Heinz" is already consumed and 
> can not be linked with the expected Entity mention "Heinz-Christian Strache"
> This issue will add a rule to EntityLinking that the currently active Token 
> need to be included in suggestions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to