Hello,
it depends on the texts you train it on, usually its a gray area. There
are corpora
which are very restrictive in this regard and only allow usage for research,
but that conflicts with the Apache License.
As far as I know do copyright laws on the source text not really apply here,
because the models just contain statistics or bigrams but no original text.
Anyway if you train on your own text and then release the model under AL 2.0
its safe to include it and distribute it.
At OpenNLP we decided to not distribute any models which are trained on
restricted
corpora at Apache without discussing it on the legal list first. But we
never spoke to them,
and I personally like the idea much more to produce open training data
which is
Apache friendly (e.g. based on wikinews or wikipeda).
HTH,
Jörn
On 10/04/2012 06:39 PM, Chen, Pei wrote:
Hi Jorn,
If we trained a model and included it as a resource within the ASF repo, just
wanted to confirm if that's acceptable in ASF even though it's in a binary
format?
Were there any issues for openNLP with including trained models?
Thanks,
Pei
-----Original Message-----
From: Jörn Kottmann [mailto:[email protected]]
Sent: Wednesday, August 01, 2012 8:01 AM
To: [email protected]
Subject: Re: licensing question
On 08/01/2012 01:01 PM, Miller, Timothy wrote:
There was some chatter last week about resources potentially being
downloaded via maven for license compatibility reasons. Just wondering if
that brings about the possibility of using external libraries that are not
apache-licensed that would also be auto-downloaded under certain maven
build commands. Specifically I was thinking of the GPL-licensed berkeley
parser which I've used to get significantly higher accuracy than the opennlp
parser we currently wrap in our constituency parser module.
Making scripts or maven build commands which download stuff is fine, but it
might turn out to be quit limiting for your users which need the freedom of
the AL. That will be a problem if Berkeley is the only option.
The HBase people for example have an optional dependency on LZO which is
GPL, and people there just need to install and download it themselves.
See here:
http://hbase.apache.org/book/lzo.compression.html
Speaking as an OpenNLP committer now, it would of course be nice to make
our parser better.
If you want to work on that we will be happy to get some patches.
Jörn