On 5/19/2011 15:04, Nicolas Hernandez wrote: > Hello Everyone > > Jörn, yes it (training MaxEnt models for OpenNLP from the French > Treebank) is actually part of our plan (building a French-Speaking > UIMA Community). We wanted also to contribute to the OpenNLP project > since no models was available for French processing! > > About the right to train models on this data set and then distribute > them under Apache License 2: It took time for us to get the right to > do it, but I think it was because we were the first to ask for. Now > they know about it. I know that the maltparser team > (http://maltparser.org/) would be also interested by the grant. You > may ask for the French Treebank authors. I can also ask them for > letting an explicit mention about the right to do it on their web > site. > > As far as I know, the data training set for the English and German POS > models are not freely available, are they ?
The English model was trained on the Brown corpus, which is free. The German model was trained on a non-free corpus. > > Eventually, Jörn, I m not sure to understand. Do you think the IP > clearance process is not adapted for submitting our contribution ? > > Tommaso, I will blog post the procedure I used to train the models. > There is nothing really special. I used some freely available (under > AL2) AE components. The HMM learner is already present in the HMM > Tagger addon. The few other UIMA components I used are also available > on some google forges (uima-common, uima-connectors, > uima-type-mapper). > > Regards > > /Nicolas > > On Thu, May 19, 2011 at 9:57 AM, Jörn Kottmann <[email protected]> wrote: >> On 5/19/11 9:00 AM, Tommaso Teofili wrote: >>> >>> If you also plan to donate the models I think the IP clearance is the >>> right >>> way both for UIMA and for you as a researcher. >>> >> >> In my opinion it is very important that we have the possibility >> to retrain the models on the data set, otherwise it will block >> code changes and bug fixes. >> >> Therefore I think we need the right to train models on this >> data set and then distribute them under AL 2.0. >> >> Jörn >> > > >
