[jira] [Updated] (OPENNLP-1182) LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise

Jeff Zemerick (JIRA) Thu, 21 Jun 2018 04:56:38 -0700


     [ 
https://issues.apache.org/jira/browse/OPENNLP-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeff Zemerick updated OPENNLP-1182:
-----------------------------------
    Fix Version/s:     (was: 1.8.5)

> LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise
> ---------------------------------------------------------------------------
>
>                 Key: OPENNLP-1182
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1182
>             Project: OpenNLP
>          Issue Type: Bug
>    Affects Versions: 1.8.4
>            Reporter: Steve Rowe
>            Priority: Major
>
> Contrary to the docs (see below), LanguageDetectorConverterTool doesn't 
> actually do anything at all; the class is empty.
> {quote}
> The following sequence of commands shows how to convert the Leipzig Corpora 
> collection at folder leipzig-train/ to the default Language Detector format, 
> by creating groups of 5 sentences as documents and limiting to 10000 
> documents per language. Them, it shuffles the result and select the first 
> 100000 lines as train corpus and the last 20000 as evaluation corpus:
> {noformat}                                    
> $ bin/opennlp LanguageDetectorConverter leipzig -sentencesDir leipzig-train/ 
> -sentencesPerSample 5 -samplesPerLanguage 10000 > leipzig.txt
> $ perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);' < leipzig.txt > 
> leipzig_shuf.txt
> $ head -100000 < leipzig_shuf.txt > leipzig.train
> $ tail -20000 < leipzig_shuf.txt > leipzig.eval
> {noformat}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (OPENNLP-1182) LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise

Reply via email to