Rupert Westenthaler created STANBOL-1110:
--------------------------------------------
Summary: Use Term Proximity for Searching Entities in the
EntityhubLinkingEngine
Key: STANBOL-1110
URL: https://issues.apache.org/jira/browse/STANBOL-1110
Project: Stanbol
Issue Type: Improvement
Components: Enhancement Engines
Affects Versions: enhancement-engines-0.10.0
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
The issue with the ranking of the results of the EntityLinkingEngine is that
some Entities had matching labels in both the language of the text as well as
the fallback language. Other only in one of the two. As Background:
The EntityLinkingEngine perfoms queries like
{lang1}:"{term1}" OR {lang1}:"{term2}" OR {lang2}:"{term1}" OR
{lang2}:"{term2}"
when linking Entities. Where {lang1} is the language detected for the document
and {lang2} is the default mapping language.
When executing such queries on the Entithub based EntitySearcher
implementations of the EntityhubLinkingEngine the ranking of results where
Entities only matching only one of the parsed terms are in front of some
matching both therms.
The reason for that is that there are two possibilities how two of the four
query terms can match
(a) both {term1} and {term2} do match in the same language
(b) a single term matches in {lang1} and {lang2}
While (a) is the matching expected by users (b) is not so unlikely. Especially
if (a) is not a very famous entity and is missing translations of its labels to
many languages and {term1} and/or {term2} is present in more famous entities
that do have such translation. Most often this happens with given names of
persons.
As the EntityLinking engine only processes (for performance reasons) only the
first few results (by default 2*maxSuggestions but at least 10) this will
cause Entities to be not linked because of the unintended ranking of results.
The new Proximity Ranking Feature (STANBOL-1105) can be used to solve this, as
it ensures that Entities matching both terms in the same language (and
therefore in the same label) will be ranked above those that match only a
single term in two different languages.
This issue will enable the use of this feature for the EntityhubLinkingEngine
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira