We need to address things such as sharing the evaluation results and how to
reproduce the training.

There are several possibilities for that, but there are points to consider:

Will we store the model itself in a SCM repository or only the code that
can build it?
Will we deploy the models to a Maven Central repository? It is good for
people using the Java API but not for command line interface, should we
change the CLI to handle models in the classpath?
Should we keep a copy of the training model or always download from the
original provider? We can't guarantee that the corpus will be there
forever, not only because it changed license, but simple because the
provider is not keeping the server up anymore.

William



2017-07-10 14:52 GMT-03:00 Joern Kottmann <kottm...@gmail.com>:

> Hello all,
>
> since Apache OpenNLP 1.8.1 we have a new language detection component
> which like all our components has to be trained. I think we should
> release a pre-build model for it trained on the Leipzig corpus. This
> will allow the majority of our users to get started very quickly with
> language detection without the need to figure out on how to train it.
>
> How should this project release models?
>
> Jörn
>

Reply via email to