I think we postponed this topic previously and since the ASF code seems to be 
in decent shape now, I think it's time to revisit this discussion for the 
longer term.
Currently, we have the below resources bundled with our source code and 
distribution

-          UMLS dictionaries (hsqldb format and in lucene indexes)

-          Models (which were okay be to release opened source) that have been 
train from various clinical data

-          Wikipedia index

What are our options as ASF source code, binaries, models, dependencies all 
need to be compliant with ASL 2.0 (http://www.apache.org/legal/3party.html)

1)      Leave things as they are, but we need to confirm with the sources and 
also will probably need to seek approval from Apache Legal for each of the 
resources

2)      Host the resources externally such as SourceForge similar to OpenNLP 
models (http://opennlp.sourceforge.net/models-1.5/)

a.       Single zip per release for users to download?

Option 2 seems the least painful in terms of compliance.
Since 3.0.0-incubating, each resource has a fully qualified name/path and is 
read from the classpath so it should be fairly easy if we decided to pull it in 
from external sources.

--Pei

Reply via email to