Rupert Westenthaler created STANBOL-1262:
--------------------------------------------

             Summary: Change/Improve processing of Chunks by EntityLinking 
                 Key: STANBOL-1262
                 URL: https://issues.apache.org/jira/browse/STANBOL-1262
             Project: Stanbol
          Issue Type: Improvement
    Affects Versions: 0.12.0
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


The first step of EntityLinking (applies to all EntityLinkingEngines incl. the 
Lucene FST Linking Engine) is that it classifies Tokens as "linkable", 
"matchable" and "others". In addition it determines "processible" chunks Tokens 
are contained in.

This issue is about changing the way how "processible" chunks are determined if 
the AnalyzedText contains multiple overlapping chunks.

A typical case where this can happen is if both a Noun Phrase Detection and a 
Named Entity Recognition is contained in the Chain. The chunks selected by 
Named Entities will typically be smaller as the corresponding Noun Phrase. 
There are even situations where the Named Entity does not even include all 
Nouns contained in a Noun Phrase.

Here an Example taken from [1]:

    After a disappointing start against an Everton side who led through Kevin 
Mirallas's first-half goal ...

While "Everton" is detected as Organization by NER, the Noun Phrase "an Everton 
side" also include 'side' as an 2nd noun. Therefore 'Everton' is not considered 
for linking as it only matches a 1/2 matchable tokens within a 'processible 
phrase'

This is because EntityLinking currently merges overlapping processible phrase 
together. A semantic that is - no longer - an optimal for EntityLinking.

To avoid recall problems like described the intersection instead of the union 
of multiple processible chunks need to be used.

For the given example this would result in

 - an [other]: an Everton side
 - Everton [linkable]: Everton
 - side [matchable]: an Everton side

So 'Everton' would get correctly linked to an Entity with the label Everton but 
'side' would not get linked to an Entity with the label Side, as it is in a 
Phrase with an other linkable/matchable token.


[1] 
http://www.theguardian.com/football/2014/jan/20/west-bromwich-albion-everton-premier-league-match-report




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to