[ 
https://issues.apache.org/jira/browse/STANBOL-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-1110.
------------------------------------------

    Resolution: Fixed

starting with http://svn.apache.org/r1492611 the EntityhubLinkingEngine uses 
Term Proximity for searches
                
> Use Term Proximity for Searching Entities in the EntityhubLinkingEngine
> -----------------------------------------------------------------------
>
>                 Key: STANBOL-1110
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1110
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancement Engines
>    Affects Versions: enhancement-engines-0.10.0
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> The issue with the ranking of the results of the EntityLinkingEngine is that 
> some Entities had matching labels in both the language of the text as well as 
> the fallback language. Other only in one of the two. As Background:
> The EntityLinkingEngine perfoms queries like
>     {lang1}:"{term1}" OR {lang1}:"{term2}" OR {lang2}:"{term1}" OR 
> {lang2}:"{term2}"
> when linking Entities. Where {lang1} is the language detected for the 
> document and {lang2} is the default mapping language.
> When executing such queries on the Entithub based EntitySearcher 
> implementations of the EntityhubLinkingEngine the ranking of results where 
> Entities only matching only one of the parsed terms are in front of some 
> matching both therms.
> The reason for that is that there are two possibilities how two of the four 
> query terms can match
>  (a) both {term1} and {term2} do match in the same language
>  (b) a single term matches in {lang1} and {lang2}
> While (a) is the matching expected by users (b) is not so unlikely. 
> Especially if (a) is not a very famous entity and is missing translations of 
> its labels to many languages and {term1} and/or {term2} is present in more 
> famous entities that do have such translation. Most often this happens with 
> given names of persons. 
> As the EntityLinking engine only processes (for performance reasons) only the 
> first few results (by default 2*maxSuggestions but at least 10)  this will 
> cause Entities to be not linked because of the unintended ranking of results.
> The new Proximity Ranking Feature (STANBOL-1105) can be used to solve this, 
> as it ensures that Entities matching both terms in the same language (and 
> therefore in the same label) will be ranked above those that match only a 
> single term in two different languages.
> This issue will enable the use of this feature for the EntityhubLinkingEngine

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to