Re: licensing question

Jörn Kottmann Thu, 04 Oct 2012 10:30:35 -0700

Hello,

it depends on the texts you train it on, usually its a gray area. Thereare corpora

which are very restrictive in this regard and only allow usage for research,
but that conflicts with the Apache License.


As far as I know do copyright laws on the source text not really apply here,
because the models just contain statistics or bigrams but no original text.

Anyway if you train on your own text and then release the model under AL 2.0
its safe to include it and distribute it.

At OpenNLP we decided to not distribute any models which are trained onrestrictedcorpora at Apache without discussing it on the legal list first. But wenever spoke to them,and I personally like the idea much more to produce open training datawhich is

Apache friendly (e.g. based on wikinews or wikipeda).

HTH,
Jörn

On 10/04/2012 06:39 PM, Chen, Pei wrote:

Hi Jorn,
If we trained a model and included it as a resource within the ASF repo, just 
wanted to confirm if that's acceptable in ASF even though it's in a binary 
format?
Were there any issues for openNLP with including trained models?

Thanks,
Pei

-----Original Message-----
From: Jörn Kottmann [mailto:[email protected]]
Sent: Wednesday, August 01, 2012 8:01 AM
To: [email protected]
Subject: Re: licensing question

On 08/01/2012 01:01 PM, Miller, Timothy wrote:

There was some chatter last week about resources potentially being

downloaded via maven for license compatibility reasons.  Just wondering if
that brings about the possibility of using external libraries that are not
apache-licensed that would also be auto-downloaded under certain maven
build commands.  Specifically I was thinking of the GPL-licensed berkeley
parser which I've used to get significantly higher accuracy than the opennlp
parser we currently wrap in our constituency parser module.

Making scripts or maven build commands which download stuff is fine, but it
might turn out to be quit limiting for your users which need the freedom of
the AL. That will be a problem if Berkeley is the only option.

The HBase people for example have an optional dependency on LZO which is
GPL, and people there just need to install and download it themselves.
See here:
http://hbase.apache.org/book/lzo.compression.html

Speaking as an OpenNLP committer now, it would of course be nice to make
our parser better.
If you want to work on that we will be happy to get some patches.

Jörn

Re: licensing question

Reply via email to