[
https://issues.apache.org/jira/browse/STANBOL-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-1122.
------------------------------------------
Resolution: Fixed
fixed with http://svn.apache.org/r1496359 (as part of STANBOL-1114)
> Only Tokens with a fully linked entity should be marked as consumed
> -------------------------------------------------------------------
>
> Key: STANBOL-1122
> URL: https://issues.apache.org/jira/browse/STANBOL-1122
> Project: Stanbol
> Issue Type: Sub-task
> Components: Enhancement Engines
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> The EntityLinking process makes Token that are already linked with an Entity
> as "consumed".
> Lets asume a text mentions:
> "An airplane crashed in the northern part of the Democratic Republic of
> the Congo"
> In case Proper Noun linking is activated "Democratic" would be the first
> "active" token within this sentence and ["Democratic", "Republic"] would be
> the first "search tokens". Now lets assume that the vocabulary contains the
> Entity "Democratic Republic of the Congo" and that is is returned by the
> EntitySearcher for a query for ["Democratic", "Republic"].
> So when the Entity "Democratic Republic of the Congo" is matched with the
> sentence all tokens until "Congo" are marked as consumed. This ensures that
> there are no further lookups for "Republic" nor "Congo".
> While this is generally good suggested Entities that do exactly match the
> text it is dangerous for partial matches as shown by the following example
> "President Barack Obama said the US estimated ..."
> If you link this text to Freebase, than "Presidency of Barack Obama"
> (https://www.freebase.com/m/05b6w1g) will get linked for the section
> "President Barack Obama". The match is "Particial" as only tree of the four
> tokens of the label do match the Text and also the not exact match of
> "Presidency" with "President" does reduce the confidence to an overall score
> of about 0.6
> However the current algorithm would still mark "Barack" and "Obama" as
> consumed and therefore prevent "Barack Obama" to be linked for this mention.
> This issue will change this in a way that only FULL matches (where all tokens
> in the label do match tokens in the text) will mark Entities in the text as
> consumed.
--
This message was sent by Atlassian JIRA
(v6.1#6144)