Re: OpenNLP 1.5.3 RC 2 ready for testing

Jörn Kottmann Fri, 22 Mar 2013 06:09:05 -0700

On 03/22/2013 01:05 PM, William Colen wrote:

We could do it with Leipzig corpus, or CONLL. We can prepare the corpus by
detokenizing it, and creating documents from it.


If it is OK to do it with other language, the AD corpus has paragraph and
text annotations, as well as the original sentences (not tokenized).


For English we should be able to use some of the CONLL data, yes, we should

definitely test with other languages too. Leipzig might be suited forsentence detectortraining, but not for tokenizer training, since the data is nottokenized as far as I know.


+1 to use AD and CONLL for testing the tokenizer and sentence detector.

Jörn

Re: OpenNLP 1.5.3 RC 2 ready for testing

Reply via email to