In my opinion we should release what we can from here at Apache
and only the resources which have an incompatible license need to
be handled differently, e.g. external site.
Models which are trained on private clinical data can be released as
long as the
original creator decides to license them under AL 2.0. If that is done
by a committer
it should be fine to just check them in or put them on the website.
The wikipedia license is compatible and an index of it as well, but we
probably need
to have attributio for it in a NOTICE file, and maybe include the
license in the LICENSE file.
Jörn
On 11/02/2012 10:46 PM, Chen, Pei wrote:
I think we postponed this topic previously and since the ASF code seems to be
in decent shape now, I think it's time to revisit this discussion for the
longer term.
Currently, we have the below resources bundled with our source code and
distribution
- UMLS dictionaries (hsqldb format and in lucene indexes)
- Models (which were okay be to release opened source) that have been
train from various clinical data
- Wikipedia index
What are our options as ASF source code, binaries, models, dependencies all
need to be compliant with ASL 2.0 (http://www.apache.org/legal/3party.html)
1) Leave things as they are, but we need to confirm with the sources and
also will probably need to seek approval from Apache Legal for each of the
resources
2) Host the resources externally such as SourceForge similar to OpenNLP
models (http://opennlp.sourceforge.net/models-1.5/)
a. Single zip per release for users to download?
Option 2 seems the least painful in terms of compliance.
Since 3.0.0-incubating, each resource has a fully qualified name/path and is
read from the classpath so it should be fairly easy if we decided to pull it in
from external sources.
--Pei