Re: Releasing a Language Detection Model

Chris Mattmann Tue, 11 Jul 2017 03:37:40 -0700

Hi,

FWIW, I’ve seen CLI tools – lots in my day – that can load from the CLI to 
override an
internal classpath dependency. This is for people in environments who want a 
sensible
/ delivered internal classpath default and the ability for run-time, non zipped 
up/messing
with JAR file override. Think about people who are using OpenNLP in both 
Java/Python
environments as an example.


Cheers,
Chris




On 7/11/17, 3:25 AM, "Joern Kottmann" <[email protected]> wrote:

    I would not change the CLI to load models from jar files. I never used
    or saw a command line tool that expects a file as an input and would
    then also load it from inside a jar file. It will be hard to
    communicate how that works precisely in the CLI usage texts and this
    is not a feature anyone would expect to be there. The intention of the
    CLI is to give users the ability to quickly test OpenNLP before they
    integrate it into their software and to train and evaluate models
    
    Users who for some reason have a jar file with a model inside can just
    write "unzip model.jar".
    
    After all I think this is quite  a bit of complexity we would need to
    add for it and it will have very limited use.
    
    The use case of publishing jar files is to make the models easily
    available to people who have a build system with dependency
    management, they won't have to download models manually, and when they
    update OpenNLP then can also update the models with a version string
    change.
    
    For the command line "quick start" use case we should offer the models
    on a download page as we do today. This page could list both, the
    download link and the maven dependency.
    
    Jörn
    
    On Mon, Jul 10, 2017 at 8:50 PM, William Colen <[email protected]> wrote:
    > We need to address things such as sharing the evaluation results and how 
to
    > reproduce the training.
    >
    > There are several possibilities for that, but there are points to 
consider:
    >
    > Will we store the model itself in a SCM repository or only the code that
    > can build it?
    > Will we deploy the models to a Maven Central repository? It is good for
    > people using the Java API but not for command line interface, should we
    > change the CLI to handle models in the classpath?
    > Should we keep a copy of the training model or always download from the
    > original provider? We can't guarantee that the corpus will be there
    > forever, not only because it changed license, but simple because the
    > provider is not keeping the server up anymore.
    >
    > William
    >
    >
    >
    > 2017-07-10 14:52 GMT-03:00 Joern Kottmann <[email protected]>:
    >
    >> Hello all,
    >>
    >> since Apache OpenNLP 1.8.1 we have a new language detection component
    >> which like all our components has to be trained. I think we should
    >> release a pre-build model for it trained on the Leipzig corpus. This
    >> will allow the majority of our users to get started very quickly with
    >> language detection without the need to figure out on how to train it.
    >>
    >> How should this project release models?
    >>
    >> Jörn
    >>

Re: Releasing a Language Detection Model

Reply via email to