[
https://issues.apache.org/jira/browse/STANBOL-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fabian Christ updated STANBOL-795:
----------------------------------
Component/s: (was: Enhancer)
Engine - OpenNLP Tokenizer
> OpenNLP Tokenizer Engine
> ------------------------
>
> Key: STANBOL-795
> URL: https://issues.apache.org/jira/browse/STANBOL-795
> Project: Stanbol
> Issue Type: Sub-task
> Components: Engine - OpenNLP Tokenizer
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> Implement an separate OpenNLP Tokenizer Engine.
> While some Engines like the OpenNLP POS or the CELI Lemmatizer engine do
> support tokenizing (if tokens do not already exist in the Analyzed Text) it
> is important to implement an engine explicitly for this task.
> This engine also supports the language configuration (see following example)
> en;model=SIMPLE
> de;model=mySpecificTokenizerModel_de.bin
> !jp
> !zh
> *
> the 'model' parameter can be used to load specific tokenizer models. "SIMPLE"
> forces the use of the OpenNLP SimpleTokenizer. If no model configuration is
> present the default tokenizer for the language is loaded ("{lang}-token.bin"
> or the simple tokenizer if the language model is not present).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira