Hi all,

OpenNLP always came with a couple of trained models which were ready to
use for a few languages. The performance a user encounters with those
models heavily depends on their input text.

Especially the English name finder models which were trained on MUC 6/7
data perform very poorly these days if run on current news articles and
even worse on data which is not in the news domain.

Anyway, we often get judged on how well OpenNLP works just based on the
performance of those models (or maybe people who compare their NLP
systems against OpenNLP just love to have OpenNLP perform badly).

I think we are now at a point with those models were it is questionable
if having them is still an advantage for OpenNLP. The SourceForge page
is often blocked due to traffic limitations. We definitely have to act
somehow.

The old models have definitely some historic value and are used for
testing the release.

What should we do?

We could take them offline and advice our users to train their own
models on one of the various corpora we support. We could also do both
and place a prominent link to our corpora documentation on the download
page and in a less visible place a link to he historic SF models.

Jörn

Reply via email to