[ 
https://issues.apache.org/jira/browse/STANBOL-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-1122.
------------------------------------------

    Resolution: Fixed

fixed with http://svn.apache.org/r1496359 (as part of STANBOL-1114)

> Only Tokens with a fully linked entity should be marked as consumed
> -------------------------------------------------------------------
>
>                 Key: STANBOL-1122
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1122
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancement Engines
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> The EntityLinking process makes Token that are already linked with an Entity 
> as "consumed". 
> Lets asume a text mentions:
>     "An airplane crashed in the northern part of the Democratic Republic of 
> the Congo"
> In case Proper Noun linking is activated "Democratic" would be the first 
> "active" token within this sentence and ["Democratic", "Republic"] would be 
> the first "search tokens". Now lets assume that the vocabulary contains the 
> Entity "Democratic Republic of the Congo" and that is is returned by the 
> EntitySearcher for a query for ["Democratic", "Republic"].
> So when the Entity "Democratic Republic of the Congo" is matched with the 
> sentence all tokens until "Congo" are marked as consumed. This ensures that 
> there are no further lookups for "Republic" nor "Congo".
> While this is generally good suggested Entities that do exactly match the 
> text it is dangerous for partial matches as shown by the following example
>     "President Barack Obama said the US estimated ..."
> If you link this text to Freebase, than "Presidency of Barack Obama" 
> (https://www.freebase.com/m/05b6w1g) will get linked for the section 
> "President Barack Obama". The match is "Particial" as only tree of the four 
> tokens of the label do match the Text and also the not exact match of 
> "Presidency" with "President" does reduce the confidence to an overall score 
> of about 0.6
> However the current algorithm would still mark "Barack" and "Obama" as 
> consumed and therefore prevent "Barack Obama" to be linked for this mention.
> This issue will change this in a way that only FULL matches (where all tokens 
> in the label do match tokens in the text) will mark Entities in the text as 
> consumed. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to