[jira] [Updated] (OPENNLP-571) Tokenizer Model for german text

Martin Wiesner (Jira) Sun, 26 Feb 2023 06:43:05 -0800


     [ 
https://issues.apache.org/jira/browse/OPENNLP-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Martin Wiesner updated OPENNLP-571:
-----------------------------------
    Priority: Minor  (was: Major)

> Tokenizer Model for german text
> -------------------------------
>
>                 Key: OPENNLP-571
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-571
>             Project: OpenNLP
>          Issue Type: Wish
>          Components: Tokenizer
>    Affects Versions: 1.6.0
>            Reporter: Andreas Niekler
>            Priority: Minor
>
> I created a tokenizer model with proper deTokenisation rules for differnt 
> sorts of quotes. The model is based on 300.000 example sentences of the 
> german version from the leipzig corpora collection. I don't know if there 
> might be any copyright protection issue because those sentences are crawled 
> from the web. If the content of the model is not in a form that would enable 
> one to reconstruct the sentences everything is fine. Please comment on those 
> thougts. If everything is ok i will contribute the model for futher testing 
> by the openNLP Team.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (OPENNLP-571) Tokenizer Model for german text

Reply via email to