[
https://issues.apache.org/jira/browse/OPENNLP-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Wiesner updated OPENNLP-571:
-----------------------------------
Priority: Minor (was: Major)
> Tokenizer Model for german text
> -------------------------------
>
> Key: OPENNLP-571
> URL: https://issues.apache.org/jira/browse/OPENNLP-571
> Project: OpenNLP
> Issue Type: Wish
> Components: Tokenizer
> Affects Versions: 1.6.0
> Reporter: Andreas Niekler
> Priority: Minor
>
> I created a tokenizer model with proper deTokenisation rules for differnt
> sorts of quotes. The model is based on 300.000 example sentences of the
> german version from the leipzig corpora collection. I don't know if there
> might be any copyright protection issue because those sentences are crawled
> from the web. If the content of the model is not in a form that would enable
> one to reconstruct the sentences everything is fine. Please comment on those
> thougts. If everything is ok i will contribute the model for futher testing
> by the openNLP Team.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)