Rupert Westenthaler created STANBOL-1229:
--------------------------------------------
Summary: Convert all OpenNLP Enhancement Engines to Configuration
Factories
Key: STANBOL-1229
URL: https://issues.apache.org/jira/browse/STANBOL-1229
Project: Stanbol
Issue Type: Improvement
Components: Enhancement Engines
Affects Versions: 0.12.0
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Priority: Minor
Fix For: 0.12.0
Currently the OpenNLP Sentence Detection and Tokenizer Enhancement Engines do
not support OSGI Configuration Factories. Because of that they do only allow a
single instance.
However this can create problems if one wants to configure multiple Enhancement
Chains with different NLP frameworks.
Here an example
Chain1:
* OpenNLP for English, German and Spanish
Chain2:
* Stanford NLP for English
* OpenNLP for German
* Freeling NLP for Spanish
As OpenNLP does support all three mentioned languages a user would like to
configure the following Engines configurations for OpenNLP:
1. OpenNLP engines for sentence detection, tokenization, POS tagging and
Chunking that include all three languages.
2. OpenNLP engines that only process German language texts for sentence
detection, tokenization, POS tagging and Chunking
3. RESTful NLP Analysis Engine calling StanfordNLP for English language texts
4. RESTful NLP Analysis Engine calling Freeling for Spanish language texts
Chain1 would use the OpenNLP engines configured to process all languages while
Chain 2 would use the engine configurations listed under point 2 to 4.
However as the OpenNLP Tokenizer and Sentence detection engine do not support
OSGI Configuration Factories this is currently not possible as only a single
Engine instance of those two engines can be configured.
Because of that English and Spanish Text sent to Chain2 would be processed by
two Sentence Detectors and Tokenizers and this results in duplicate Sentence
and Token annotations.
Adding support for OSGI Configuration Factories to all OpenNLP
EnhancementEngines will solve this issue. Existing Configurations will be not
affected as all engines do already use "ConfigurationPolicy.OPTIONAL" - meaning
that a default instance with the default configuration is created automatically.
This Issues affects both the trunk as well as the 0.12 releasing branch
--
This message was sent by Atlassian JIRA
(v6.1#6144)