The github project for distributing model files sounds like a great idea.

It would also be very useful to get an authoritative list (with name,
description, and especially URL) of the training data files used to generate
each of the trained models.
Especially for models trained using OpenNLP training data, it is not clear
where the training data files are available.
By making the training data files available, OpenNLP can enable users to
augment them by adding their own training samples and retrain on the augment
data set.
Retraining would help significantly either in improving accuracy in
different problem domains (e.g., blog articles compared to newspaper
articles, etc) or covering for corner cases missed by the original training
data. Having the original training data will help immeasurably since it will
be much more manageable for users to merely add their own training samples,
compared to generating and annotating all the original training samples.

Any thoughts on this?


-----Original Message-----
From: Jörn Kottmann [mailto:[email protected]] 
Sent: Monday, April 16, 2012 2:52 AM
To: [email protected]
Subject: Re: Could we please have a jar file for models and a maven
dependency?

On 04/15/2012 11:10 PM, agks mehx wrote:
> It would really help to build if there were a jar file (or files) 
> containing the models, along with maven dependency!
>

The Apache OpenNLP project currently does not distribute any model files.
Therefore we will not be able to produce maven dependencies here.

Most model files can currently be found at SourceForge and we are currently
working on a new model distribution project over at github:
https://github.com/utcompling/OpenNLP-Models

Jörn

Reply via email to