[ 
https://issues.apache.org/jira/browse/STANBOL-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Christ updated STANBOL-795:
----------------------------------

    Component/s:     (was: Enhancer)
                 Engine - OpenNLP Tokenizer
    
> OpenNLP Tokenizer Engine
> ------------------------
>
>                 Key: STANBOL-795
>                 URL: https://issues.apache.org/jira/browse/STANBOL-795
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Engine - OpenNLP Tokenizer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Implement an separate OpenNLP Tokenizer Engine.
> While some Engines like the OpenNLP POS or the CELI Lemmatizer engine do 
> support tokenizing (if tokens do not already exist in the Analyzed Text) it 
> is important to implement an engine explicitly for this task.
> This engine also supports the language configuration (see following example)
>     en;model=SIMPLE
>     de;model=mySpecificTokenizerModel_de.bin
>     !jp
>     !zh
>     *
> the 'model' parameter can be used to load specific tokenizer models. "SIMPLE" 
> forces the use of the OpenNLP SimpleTokenizer. If no model configuration is 
> present the default tokenizer for the language is loaded ("{lang}-token.bin" 
> or the simple tokenizer if the language model is not present).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to