2011/6/15 Tommaso Teofili <[email protected]> > Nicolas, > your post on opennlp-user@ made me realize we didn't take care of helping > you here yet. > Did you get the ACK for your SGA? >
I see it's been recorded, so I think we can proceed. Tommaso > Regards, > Tommaso > > 2011/5/26 Nicolas Hernandez <[email protected]> > >> Hi >> >> French data models for the Apache UIMA Sandbox HMM Tagger have been >> submitted via the jira issue >> https://issues.apache.org/jira/browse/UIMA-2146 >> >> Documentation on the procedure to build the models from the French >> Treebank can be found here (accidentally it is in French...) >> >> http://enicolashernandez.blogspot.com/2011/05/construire-des-modelisations-du-french.html >> >> The SLA has been sent and we are waiting for receiving the ack. >> >> I have prepared an IP form but have not right to commit it... >> >> Finaly is there an "appropriate volunter" for executing the IP >> Clearance processing? >> >> I hope I have nothing forgotten. >> >> Best regards >> >> /Nicolas >> >> On Thu, May 19, 2011 at 3:47 PM, Thilo Götz <[email protected]> wrote: >> > On 5/19/2011 15:04, Nicolas Hernandez wrote: >> >> Hello Everyone >> >> >> >> Jörn, yes it (training MaxEnt models for OpenNLP from the French >> >> Treebank) is actually part of our plan (building a French-Speaking >> >> UIMA Community). We wanted also to contribute to the OpenNLP project >> >> since no models was available for French processing! >> >> >> >> About the right to train models on this data set and then distribute >> >> them under Apache License 2: It took time for us to get the right to >> >> do it, but I think it was because we were the first to ask for. Now >> >> they know about it. I know that the maltparser team >> >> (http://maltparser.org/) would be also interested by the grant. You >> >> may ask for the French Treebank authors. I can also ask them for >> >> letting an explicit mention about the right to do it on their web >> >> site. >> >> >> >> As far as I know, the data training set for the English and German POS >> >> models are not freely available, are they ? >> > >> > The English model was trained on the Brown corpus, which is free. >> > The German model was trained on a non-free corpus. >> > >> >> >> >> Eventually, Jörn, I m not sure to understand. Do you think the IP >> >> clearance process is not adapted for submitting our contribution ? >> >> >> >> Tommaso, I will blog post the procedure I used to train the models. >> >> There is nothing really special. I used some freely available (under >> >> AL2) AE components. The HMM learner is already present in the HMM >> >> Tagger addon. The few other UIMA components I used are also available >> >> on some google forges (uima-common, uima-connectors, >> >> uima-type-mapper). >> >> >> >> Regards >> >> >> >> /Nicolas >> >> >> >> On Thu, May 19, 2011 at 9:57 AM, Jörn Kottmann <[email protected]> >> wrote: >> >>> On 5/19/11 9:00 AM, Tommaso Teofili wrote: >> >>>> >> >>>> If you also plan to donate the models I think the IP clearance is the >> >>>> right >> >>>> way both for UIMA and for you as a researcher. >> >>>> >> >>> >> >>> In my opinion it is very important that we have the possibility >> >>> to retrain the models on the data set, otherwise it will block >> >>> code changes and bug fixes. >> >>> >> >>> Therefore I think we need the right to train models on this >> >>> data set and then distribute them under AL 2.0. >> >>> >> >>> Jörn >> >>> >> >> >> >> >> >> >> > >> >> >> >> -- >> [email protected] >> # >> http://enicolashernandez.blogspot.com >> http://www.univ-nantes.fr/hernandez-n >> # >> Laboratoire LINA-TALN CNRS UMR 6241 >> tel. +33 (0)2 51 12 58 55 >> # >> Université de Nantes - Institut Universitaire de Technologie - >> Département Informatique >> tel. +33 (0)2 40 30 60 67 >> > >
