[jira] [Resolved] (STANBOL-740) Adopt the KeywordLinkingEngine to use the AnalyzedText content part

Rupert Westenthaler (JIRA) Wed, 21 Nov 2012 06:40:00 -0800

     [ 
https://issues.apache.org/jira/browse/STANBOL-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rupert Westenthaler resolved STANBOL-740.
-----------------------------------------

    Resolution: Fixed

Considered as resolved by http://svn.apache.org/viewvc?rev=1412121&view=rev

The KeywordLinkingEngine is now a lot more powerful. The adapted version is 
mostly compatible with the current trunk version. Only the "Keyword Tokenizer" 
feature of the trunk version is no longer supported. Current configurations 
that activate this feature will still work, but ignore this configuration.

Enhancement Results for current configurations might change in some rare cases 
including:

* Labels that match less than 75% of the label tokens will no longer be 
suggested. The trunk version suggest all Labels that match more than two 
tokens. This will only affect suggestions with low confidence values
* The branch version fixes an unreported Bug in the algorithm used for matching 
labels. This fix will allow to detect matches where the text is "word1-word2" 
and the label mentions "word1 word2".

The main difference for users will be that they NEED to adapt their Enhancement 
Chains as the new KeywordLinkingEngine can no longer consume pain text but 
requires the AnalyzedText content part. The AnalyzedText needs also to provide 
Tokens!
                
> Adopt the KeywordLinkingEngine to use the AnalyzedText content part
> -------------------------------------------------------------------
>
>                 Key: STANBOL-740
>                 URL: https://issues.apache.org/jira/browse/STANBOL-740
>             Project: Stanbol
>          Issue Type: Sub-task
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> The KeywordLinkingEngine currently does both NLP processing AND linking 
> against the target vocabulary. Up to now this was the only possibility as 
> separating those two things was not feasible with the limitations of the RDF 
> metadata.
> With the introduction of the AnalyzedText content part the NLP processing 
> part needs no longer be part of the KeywordLinkingEngine.
> This issue covers
> * removal of the NLP related functionality from the KeywordLinkingEngine
> * reimplementation of the linking part on top of the API provided by the 
> AnalyzedText contentpart
> * add support fore new features of the NLP chain
>     * use lemmas - if available - for entity lookup
>     * use POS tagset mappings to the OLIA ontology to decide what tokens to 
> lookup
> After this change the KeywordLinkingEngine will also be able to work in 
> combination with any NLP framework that is integrated with the Stanbol NLP 
> components (writes its data to the AnalyzedText content part). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (STANBOL-740) Adopt the KeywordLinkingEngine to use the AnalyzedText content part

Reply via email to