[
https://issues.apache.org/jira/browse/STANBOL-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-1117.
------------------------------------------
Resolution: Won't Fix
Further investigation have shown that the using the POS tag to improve the
selection of Tokens used to lookup Entities within the vocabulary is not
feasible. The Reasons are:
* configuration of "chunkable" POS tags: It depends often on the specific case
if a POS tag should be considered as "chinkable" or not
* negative impact on the linking support in cases where there are no POS tags
present.
In case users do want a functionality like that they should implement an Engine
that adds Chunk annotations based on an algorithm over POS tags.
> Use POS tag information for better selection of search tokens for
> EntityLookups
> -------------------------------------------------------------------------------
>
> Key: STANBOL-1117
> URL: https://issues.apache.org/jira/browse/STANBOL-1117
> Project: Stanbol
> Issue Type: Sub-task
> Components: Enhancement Engines
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> Currently EntityLinking determines Tokens used for lookups in the controlled
> vocabularies like follows
> * start from a "linkable" Token
> * search surrounding Tokens for other "linkable" or "matchable" Tokens
> * until "Max Search Token Distance" (default 3 Tokens) or
> * more than one non "matchable" Token was found
> * Max Search Tokens (default 2 Tokens) are selected but
> * never use Tokes earlier as the last consumed (already linked) tokens
> * in the case of explicitly annotated Chunks the selection of search tokens
> is in addition limited by those chunks
> This Issue will try to improve this algorithm by considering
> * "Linkable" and "matchable" Tokens
> * Tokens with "chunkable" POS annotations
> when selecting search Tokens. This will allow better selection of search
> tokens in cases where not Chunker (NounPhrase detection and/or NER) are
> present.
> With this in place it need to be checked if increasing the default "Max
> Search Tokens" could lead to better results and possible performance - if one
> query could be used to link multiple Entities for non overlapping spans).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira