Re: Releasing a Language Detection Model

Chris Mattmann Mon, 10 Jul 2017 14:15:01 -0700

+1. In terms of releasing models, maybe an opennlp-models package, and then 
using Maven structure of src/main/resources/<package prefix dirs>/*.bin for 
putting the models.


Then using an assembly descriptor to compile the above into a *-bin.jar?

Cheers,
Chris




On 7/10/17, 4:09 PM, "Joern Kottmann" <[email protected]> wrote:

    My opinion about this is that we should offer the model as maven
    dependency for users who just want to use it in their projects, and
    also offer models for download for people to quickly try out OpenNLP.
    If the models can be downloaded, a new users could very quickly test
    it via the command line.
    
    I don't really have any thoughts yet on how we should organize it, it
    would probably be nice to have some place where we can share all the
    training data, and then have the scripts to produce the models checked
    in. It should be easy to retrain all the models in case we do a major
    release.
    
    In case a corpus is vanishing we should drop support for it, must be
    obsolete then.
    
    Jörn
    
    On Mon, Jul 10, 2017 at 8:50 PM, William Colen <[email protected]> wrote:
    > We need to address things such as sharing the evaluation results and how 
to
    > reproduce the training.
    >
    > There are several possibilities for that, but there are points to 
consider:
    >
    > Will we store the model itself in a SCM repository or only the code that
    > can build it?
    > Will we deploy the models to a Maven Central repository? It is good for
    > people using the Java API but not for command line interface, should we
    > change the CLI to handle models in the classpath?
    > Should we keep a copy of the training model or always download from the
    > original provider? We can't guarantee that the corpus will be there
    > forever, not only because it changed license, but simple because the
    > provider is not keeping the server up anymore.
    >
    > William
    >
    >
    >
    > 2017-07-10 14:52 GMT-03:00 Joern Kottmann <[email protected]>:
    >
    >> Hello all,
    >>
    >> since Apache OpenNLP 1.8.1 we have a new language detection component
    >> which like all our components has to be trained. I think we should
    >> release a pre-build model for it trained on the Leipzig corpus. This
    >> will allow the majority of our users to get started very quickly with
    >> language detection without the need to figure out on how to train it.
    >>
    >> How should this project release models?
    >>
    >> Jörn
    >>

Re: Releasing a Language Detection Model

Reply via email to