Rupert Westenthaler created STANBOL-1117:
--------------------------------------------
Summary: Use POS tag information for better selection of search
tokens for EntityLookups
Key: STANBOL-1117
URL: https://issues.apache.org/jira/browse/STANBOL-1117
Project: Stanbol
Issue Type: Sub-task
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Currently EntityLinking determines Tokens used for lookups in the controlled
vocabularies like follows
* start from a "linkable" Token
* search surrounding Tokens for other "linkable" or "matchable" Tokens
* until "Max Search Token Distance" (default 3 Tokens) or
* more than one non "matchable" Token was found
* Max Search Tokens (default 2 Tokens) are selected but
* never use Tokes earlier as the last consumed (already linked) tokens
* in the case of explicitly annotated Chunks the selection of search tokens is
in addition limited by those chunks
This Issue will try to improve this algorithm by considering
* "Linkable" and "matchable" Tokens
* Tokens with "chunkable" POS annotations
when selecting search Tokens. This will allow better selection of search tokens
in cases where not Chunker (NounPhrase detection and/or NER) are present.
With this in place it need to be checked if increasing the default "Max Search
Tokens" could lead to better results and possible performance - if one query
could be used to link multiple Entities for non overlapping spans).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira