Rupert Westenthaler created STANBOL-1229:
--------------------------------------------

             Summary: Convert all OpenNLP Enhancement Engines to Configuration 
Factories
                 Key: STANBOL-1229
                 URL: https://issues.apache.org/jira/browse/STANBOL-1229
             Project: Stanbol
          Issue Type: Improvement
          Components: Enhancement Engines
    Affects Versions: 0.12.0
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
            Priority: Minor
             Fix For: 0.12.0


Currently the OpenNLP Sentence Detection and Tokenizer Enhancement Engines do 
not support OSGI Configuration Factories. Because of that they do only allow a 
single instance.

However this can create problems if one wants to configure multiple Enhancement 
Chains with different NLP frameworks. 

Here an example

Chain1:
 * OpenNLP for English, German and Spanish

Chain2:
 * Stanford NLP for English
 * OpenNLP for German
 * Freeling NLP for Spanish

As OpenNLP does support all three mentioned languages a user would like to 
configure the following Engines configurations for OpenNLP:

1. OpenNLP engines for sentence detection, tokenization, POS tagging and 
Chunking that include all three languages.
2. OpenNLP engines that only process German language texts for sentence 
detection, tokenization, POS tagging and Chunking
3. RESTful NLP Analysis Engine calling StanfordNLP for English language texts
4. RESTful NLP Analysis Engine calling Freeling for Spanish language texts

Chain1 would use the OpenNLP engines configured to process all languages while 
Chain 2 would use the engine configurations listed under point 2 to 4.

However as the OpenNLP Tokenizer and Sentence detection engine do not support 
OSGI Configuration Factories this is currently not possible as only a single 
Engine instance of those two engines can be configured.

Because of that English and Spanish Text sent to Chain2 would be processed by 
two Sentence Detectors and Tokenizers and this results in duplicate Sentence 
and Token annotations.

Adding support for OSGI Configuration Factories to all OpenNLP 
EnhancementEngines will solve this issue. Existing Configurations will be not 
affected as all engines do already use "ConfigurationPolicy.OPTIONAL" - meaning 
that a default instance with the default configuration is created automatically.

This Issues affects both the trunk as well as the 0.12 releasing branch




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to