[
https://issues.apache.org/jira/browse/STANBOL-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-1110.
------------------------------------------
Resolution: Fixed
starting with http://svn.apache.org/r1492611 the EntityhubLinkingEngine uses
Term Proximity for searches
> Use Term Proximity for Searching Entities in the EntityhubLinkingEngine
> -----------------------------------------------------------------------
>
> Key: STANBOL-1110
> URL: https://issues.apache.org/jira/browse/STANBOL-1110
> Project: Stanbol
> Issue Type: Improvement
> Components: Enhancement Engines
> Affects Versions: enhancement-engines-0.10.0
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> The issue with the ranking of the results of the EntityLinkingEngine is that
> some Entities had matching labels in both the language of the text as well as
> the fallback language. Other only in one of the two. As Background:
> The EntityLinkingEngine perfoms queries like
> {lang1}:"{term1}" OR {lang1}:"{term2}" OR {lang2}:"{term1}" OR
> {lang2}:"{term2}"
> when linking Entities. Where {lang1} is the language detected for the
> document and {lang2} is the default mapping language.
> When executing such queries on the Entithub based EntitySearcher
> implementations of the EntityhubLinkingEngine the ranking of results where
> Entities only matching only one of the parsed terms are in front of some
> matching both therms.
> The reason for that is that there are two possibilities how two of the four
> query terms can match
> (a) both {term1} and {term2} do match in the same language
> (b) a single term matches in {lang1} and {lang2}
> While (a) is the matching expected by users (b) is not so unlikely.
> Especially if (a) is not a very famous entity and is missing translations of
> its labels to many languages and {term1} and/or {term2} is present in more
> famous entities that do have such translation. Most often this happens with
> given names of persons.
> As the EntityLinking engine only processes (for performance reasons) only the
> first few results (by default 2*maxSuggestions but at least 10) this will
> cause Entities to be not linked because of the unintended ranking of results.
> The new Proximity Ranking Feature (STANBOL-1105) can be used to solve this,
> as it ensures that Entities matching both terms in the same language (and
> therefore in the same label) will be ranked above those that match only a
> single term in two different languages.
> This issue will enable the use of this feature for the EntityhubLinkingEngine
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira